Skip to content
>_ TrueFileSize.com

Sample Parquet File Download — Free Apache Parquet for Testing

Download free Apache Parquet example files from 100KB to 50MB — Snappy, GZIP, and uncompressed variants. These Parquet test files are built for data engineers and analysts working with Spark, Pandas, BigQuery, Athena, DuckDB, and Snowflake. Use them as parquet files for testing data lake ingestion, ETL pipelines, and columnar query performance.

sample-100kb.parquet

101 KB

1,100 rows · SNAPPY

Verified file details
Filename
sample-100kb.parquet
Exact size
103,210 bytes
Displayed size
101 KB
MIME type
application/octet-stream
Rows
1,100
Columns
6
Codec
SNAPPY
Note
simple-flat
License
CC0 / Public Domain
Download URL
https://truefilesize.com/files/parquet/sample-100kb.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-500kb.parquet

509 KB

3,300 rows · SNAPPY

Verified file details
Filename
sample-500kb.parquet
Exact size
520,983 bytes
Displayed size
509 KB
MIME type
application/octet-stream
Rows
3,300
Columns
10
Codec
SNAPPY
Note
nested-schema
License
CC0 / Public Domain
Download URL
https://truefilesize.com/files/parquet/sample-500kb.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-1mb.parquet

1.05 MB

5,000 rows · GZIP

Verified file details
Filename
sample-1mb.parquet
Exact size
1,105,453 bytes
Displayed size
1.05 MB
MIME type
application/octet-stream
Rows
5,000
Columns
12
Codec
GZIP
Note
with-nulls
License
CC0 / Public Domain
Download URL
https://truefilesize.com/files/parquet/sample-1mb.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-5mb.parquet

5.14 MB

22,000 rows · SNAPPY

Verified file details
Filename
sample-5mb.parquet
Exact size
5,393,543 bytes
Displayed size
5.14 MB
MIME type
application/octet-stream
Rows
22,000
Columns
15
Codec
SNAPPY
Note
large-columns
License
CC0 / Public Domain
Download URL
https://truefilesize.com/files/parquet/sample-5mb.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-10mb.parquet

10.28 MB

44,000 rows · SNAPPY

Verified file details
Filename
sample-10mb.parquet
Exact size
10,780,760 bytes
Displayed size
10.28 MB
MIME type
application/octet-stream
Rows
44,000
Columns
15
Codec
SNAPPY
Note
production-like
License
CC0 / Public Domain
Download URL
https://truefilesize.com/files/parquet/sample-10mb.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-50mb.parquet

52.96 MB

150,000 rows · SNAPPY

Verified file details
Filename
sample-50mb.parquet
Exact size
55,536,645 bytes
Displayed size
52.96 MB
MIME type
application/octet-stream
Rows
150,000
Columns
20
Codec
SNAPPY
Note
stress-test
License
CC0 / Public Domain
Download URL
https://truefilesize.com/files/parquet/sample-50mb.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-uncompressed.parquet

199 KB

2,000 rows · NONE

Verified file details
Filename
sample-uncompressed.parquet
Exact size
203,918 bytes
Displayed size
199 KB
MIME type
application/octet-stream
Rows
2,000
Columns
8
Codec
NONE
Note
uncompressed
License
CC0 / Public Domain
Download URL
https://truefilesize.com/files/parquet/sample-uncompressed.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-gzip.parquet

298 KB

3,000 rows · GZIP

Verified file details
Filename
sample-gzip.parquet
Exact size
305,642 bytes
Displayed size
298 KB
MIME type
application/octet-stream
Rows
3,000
Columns
8
Codec
GZIP
Note
gzip-compressed
License
CC0 / Public Domain
Download URL
https://truefilesize.com/files/parquet/sample-gzip.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

Use cases for sample Parquet files

  • Testing Parquet readers (pyarrow, DuckDB, Spark, pandas)
  • Benchmarking Parquet vs CSV read performance
  • Testing data lake ingestion pipelines (S3, GCS, ADLS)
  • Verifying Parquet schema evolution and compatibility
  • Testing BI tool Parquet import (Tableau, Power BI, Metabase)
  • Validating Snappy vs GZIP compression handling

Parquet vs CSV vs JSON for analytics

FeatureParquetCSVJSON
Storage layoutColumnarRow-basedRow-based
File size (1M rows)~50 MB~200 MB~400 MB
Column pruningYes (read only needed cols)No (read all)No (read all)
Schema enforcementYes (typed columns)No (all strings)Partial
Predicate pushdownYes (row group stats)NoNo
Human readableNo (binary)YesYes
Best forAnalytics, data lakes, MLData exchange, importsAPIs, configs

How to read and write Parquet files

# Python (pandas + pyarrow — most common)
import pandas as pd
df = pd.read_parquet('data.parquet')
df.to_parquet('output.parquet', engine='pyarrow')

# Python (polars — faster alternative)
import polars as pl
df = pl.read_parquet('data.parquet')

# DuckDB (SQL on Parquet — zero copy)
duckdb.sql("SELECT * FROM 'data.parquet' WHERE age > 30")
duckdb.sql("COPY (SELECT * FROM my_table) TO 'out.parquet'")

# Apache Spark
df = spark.read.parquet("s3://bucket/data.parquet")

# CLI inspection (parquet-tools / pqrs)
parquet-tools schema data.parquet
parquet-tools head data.parquet
pqrs schema data.parquet

Parquet compression codecs

CodecRatioSpeedWhen to use
SnappyGoodVery fastDefault — best balance (Spark, DuckDB)
GZIPBestSlowLong-term storage, bandwidth-limited
ZSTDBestFastModern alternative to GZIP (Spark 3+)
None1:1FastestTesting, already-compressed data

Technical specifications

Full nameApache Parquet
Extension.parquet
TypeColumnar binary storage format
Magic bytesPAR1 (header and footer)
CompressionSnappy (default), GZIP, ZSTD, LZ4, Brotli, None
EncodingDictionary, RLE, Delta, Bit-packing
Nested typesDremel-style repetition/definition levels
Developed byTwitter + Cloudera (2013), Apache project

Frequently Asked Questions

What is Apache Parquet?
Apache Parquet is an open-source columnar storage format designed for efficient analytics on large datasets. Unlike CSV (row-based), Parquet stores data column by column — enabling column pruning, better compression (4-10x smaller than CSV), and predicate pushdown. It's the default format for data lakes on S3/GCS/ADLS, used by Spark, BigQuery, Athena, Snowflake, and every major analytics platform.
Parquet vs CSV — Which is better for data?
Parquet is dramatically better for analytics: (1) 4-10x smaller files via columnar compression. (2) Column pruning — read only the columns you need. (3) Predicate pushdown — skip irrelevant row groups using min/max stats. (4) Schema enforcement — typed columns prevent data errors. CSV is better for human readability and simple data exchange. A 10GB CSV becomes ~1GB Parquet with 10-100x faster query performance.
Parquet vs Avro — What is the difference?
Parquet is columnar (optimized for read-heavy analytics — SELECT specific columns). Avro is row-based (optimized for write-heavy streaming and full-record access). Use Parquet for data warehouses, BI tools, and ad-hoc queries. Use Avro for Kafka event streams, data ingestion, and schema evolution. Many pipelines use Avro for ingestion → convert to Parquet for analytics.
How to read Parquet file?
Python: pd.read_parquet('data.parquet') or polars.read_parquet('data.parquet'). SQL: DuckDB — SELECT * FROM 'data.parquet'. Spark: spark.read.parquet('s3://bucket/data.parquet'). CLI: parquet-tools head data.parquet or pqrs schema data.parquet. Cloud: BigQuery, Athena, and Snowflake query Parquet directly from S3/GCS.
Parquet compression types — Snappy vs GZIP vs ZSTD?
Snappy (default): fastest decompression (5-10x faster than GZIP), ~20% larger files. Best for interactive queries in Spark/DuckDB/Presto. GZIP: best compression ratio but slowest. Best for cold storage and bandwidth-limited transfers. ZSTD: best of both worlds — near-GZIP compression with near-Snappy speed. Supported in Spark 3+, Arrow, DuckDB. Our sample files include Snappy, GZIP, and uncompressed variants.
How do I convert CSV to Parquet?
Python: pd.read_csv('data.csv').to_parquet('data.parquet', engine='pyarrow'). DuckDB: COPY (SELECT * FROM 'data.csv') TO 'data.parquet'. Spark: spark.read.csv('data.csv').write.parquet('output/'). Always specify column dtypes when converting to avoid all-string schemas in the output Parquet file.

Other data formats

Related reading