Sample Parquet File Download — Free Apache Parquet for Testing

Download free Apache Parquet example files from 100KB to 50MB — Snappy, GZIP, and uncompressed variants. These Parquet test files are built for data engineers and analysts working with Spark, Pandas, BigQuery, Athena, DuckDB, and Snowflake. Use them as parquet files for testing data lake ingestion, ETL pipelines, and columnar query performance.

sample-100kb.parquet

101 KB

1,100 rows · SNAPPY

Download

Verified file details

Filename: sample-100kb.parquet
Exact size: 103,210 bytes
Displayed size: 101 KB
MIME type: application/octet-stream
Rows: 1,100
Columns: 6
Codec: SNAPPY
Note: simple-flat
License: CC0 / Public Domain
Download URL: https://truefilesize.com/files/parquet/sample-100kb.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-500kb.parquet

509 KB

3,300 rows · SNAPPY

Download

Verified file details

Filename: sample-500kb.parquet
Exact size: 520,983 bytes
Displayed size: 509 KB
MIME type: application/octet-stream
Rows: 3,300
Columns: 10
Codec: SNAPPY
Note: nested-schema
License: CC0 / Public Domain
Download URL: https://truefilesize.com/files/parquet/sample-500kb.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-1mb.parquet

1.05 MB

5,000 rows · GZIP

Download

Verified file details

Filename: sample-1mb.parquet
Exact size: 1,105,453 bytes
Displayed size: 1.05 MB
MIME type: application/octet-stream
Rows: 5,000
Columns: 12
Codec: GZIP
Note: with-nulls
License: CC0 / Public Domain
Download URL: https://truefilesize.com/files/parquet/sample-1mb.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-5mb.parquet

5.14 MB

22,000 rows · SNAPPY

Download

Verified file details

Filename: sample-5mb.parquet
Exact size: 5,393,543 bytes
Displayed size: 5.14 MB
MIME type: application/octet-stream
Rows: 22,000
Columns: 15
Codec: SNAPPY
Note: large-columns
License: CC0 / Public Domain
Download URL: https://truefilesize.com/files/parquet/sample-5mb.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-10mb.parquet

10.28 MB

44,000 rows · SNAPPY

Download

Verified file details

Filename: sample-10mb.parquet
Exact size: 10,780,760 bytes
Displayed size: 10.28 MB
MIME type: application/octet-stream
Rows: 44,000
Columns: 15
Codec: SNAPPY
Note: production-like
License: CC0 / Public Domain
Download URL: https://truefilesize.com/files/parquet/sample-10mb.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-50mb.parquet

52.96 MB

150,000 rows · SNAPPY

Download

Verified file details

Filename: sample-50mb.parquet
Exact size: 55,536,645 bytes
Displayed size: 52.96 MB
MIME type: application/octet-stream
Rows: 150,000
Columns: 20
Codec: SNAPPY
Note: stress-test
License: CC0 / Public Domain
Download URL: https://truefilesize.com/files/parquet/sample-50mb.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-uncompressed.parquet

199 KB

2,000 rows · NONE

Download

Verified file details

Filename: sample-uncompressed.parquet
Exact size: 203,918 bytes
Displayed size: 199 KB
MIME type: application/octet-stream
Rows: 2,000
Columns: 8
Codec: NONE
Note: uncompressed
License: CC0 / Public Domain
Download URL: https://truefilesize.com/files/parquet/sample-uncompressed.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

sample-gzip.parquet

298 KB

3,000 rows · GZIP

Download

Verified file details

Filename: sample-gzip.parquet
Exact size: 305,642 bytes
Displayed size: 298 KB
MIME type: application/octet-stream
Rows: 3,000
Columns: 8
Codec: GZIP
Note: gzip-compressed
License: CC0 / Public Domain
Download URL: https://truefilesize.com/files/parquet/sample-gzip.parquet

See how TrueFileSize generates and measures sample files, or review the editorial policy.

Use cases for sample Parquet files

Testing Parquet readers (pyarrow, DuckDB, Spark, pandas)
Benchmarking Parquet vs CSV read performance
Testing data lake ingestion pipelines (S3, GCS, ADLS)
Verifying Parquet schema evolution and compatibility
Testing BI tool Parquet import (Tableau, Power BI, Metabase)
Validating Snappy vs GZIP compression handling

Parquet vs CSV vs JSON for analytics

Feature	Parquet	CSV	JSON
Storage layout	Columnar	Row-based	Row-based
File size (1M rows)	~50 MB	~200 MB	~400 MB
Column pruning	Yes (read only needed cols)	No (read all)	No (read all)
Schema enforcement	Yes (typed columns)	No (all strings)	Partial
Predicate pushdown	Yes (row group stats)	No	No
Human readable	No (binary)	Yes	Yes
Best for	Analytics, data lakes, ML	Data exchange, imports	APIs, configs

How to read and write Parquet files

# Python (pandas + pyarrow — most common)
import pandas as pd
df = pd.read_parquet('data.parquet')
df.to_parquet('output.parquet', engine='pyarrow')

# Python (polars — faster alternative)
import polars as pl
df = pl.read_parquet('data.parquet')

# DuckDB (SQL on Parquet — zero copy)
duckdb.sql("SELECT * FROM 'data.parquet' WHERE age > 30")
duckdb.sql("COPY (SELECT * FROM my_table) TO 'out.parquet'")

# Apache Spark
df = spark.read.parquet("s3://bucket/data.parquet")

# CLI inspection (parquet-tools / pqrs)
parquet-tools schema data.parquet
parquet-tools head data.parquet
pqrs schema data.parquet

Parquet compression codecs

Codec	Ratio	Speed	When to use
Snappy	Good	Very fast	Default — best balance (Spark, DuckDB)
GZIP	Best	Slow	Long-term storage, bandwidth-limited
ZSTD	Best	Fast	Modern alternative to GZIP (Spark 3+)
None	1:1	Fastest	Testing, already-compressed data

Technical specifications

Full name	Apache Parquet
Extension	.parquet
Type	Columnar binary storage format
Magic bytes	PAR1 (header and footer)
Compression	Snappy (default), GZIP, ZSTD, LZ4, Brotli, None
Encoding	Dictionary, RLE, Delta, Bit-packing
Nested types	Dremel-style repetition/definition levels
Developed by	Twitter + Cloudera (2013), Apache project

Frequently Asked Questions

What is Apache Parquet?

Apache Parquet is an open-source columnar storage format designed for efficient analytics on large datasets. Unlike CSV (row-based), Parquet stores data column by column — enabling column pruning, better compression (4-10x smaller than CSV), and predicate pushdown. It's the default format for data lakes on S3/GCS/ADLS, used by Spark, BigQuery, Athena, Snowflake, and every major analytics platform.

Parquet vs CSV — Which is better for data?

Parquet is dramatically better for analytics: (1) 4-10x smaller files via columnar compression. (2) Column pruning — read only the columns you need. (3) Predicate pushdown — skip irrelevant row groups using min/max stats. (4) Schema enforcement — typed columns prevent data errors. CSV is better for human readability and simple data exchange. A 10GB CSV becomes ~1GB Parquet with 10-100x faster query performance.

Parquet vs Avro — What is the difference?

Parquet is columnar (optimized for read-heavy analytics — SELECT specific columns). Avro is row-based (optimized for write-heavy streaming and full-record access). Use Parquet for data warehouses, BI tools, and ad-hoc queries. Use Avro for Kafka event streams, data ingestion, and schema evolution. Many pipelines use Avro for ingestion → convert to Parquet for analytics.

How to read Parquet file?

Python: pd.read_parquet('data.parquet') or polars.read_parquet('data.parquet'). SQL: DuckDB — SELECT * FROM 'data.parquet'. Spark: spark.read.parquet('s3://bucket/data.parquet'). CLI: parquet-tools head data.parquet or pqrs schema data.parquet. Cloud: BigQuery, Athena, and Snowflake query Parquet directly from S3/GCS.

Parquet compression types — Snappy vs GZIP vs ZSTD?

Snappy (default): fastest decompression (5-10x faster than GZIP), ~20% larger files. Best for interactive queries in Spark/DuckDB/Presto. GZIP: best compression ratio but slowest. Best for cold storage and bandwidth-limited transfers. ZSTD: best of both worlds — near-GZIP compression with near-Snappy speed. Supported in Spark 3+, Arrow, DuckDB. Our sample files include Snappy, GZIP, and uncompressed variants.

How do I convert CSV to Parquet?

Python: pd.read_csv('data.csv').to_parquet('data.parquet', engine='pyarrow'). DuckDB: COPY (SELECT * FROM 'data.csv') TO 'data.parquet'. Spark: spark.read.csv('data.csv').write.parquet('output/'). Always specify column dtypes when converting to avoid all-string schemas in the output Parquet file.

Other data formats

CSV JSON SQLite SQL

Sample Parquet File Download — Free Apache Parquet for Testing

Use cases for sample Parquet files

Parquet vs CSV vs JSON for analytics

How to read and write Parquet files

Parquet compression codecs

Technical specifications

Frequently Asked Questions

Other data formats

Related reading

Mocking REST APIs with JSON Fixtures

Sample JSON Data for API Testing and Mocking

Seeding Test Databases with Sample Data — SQL, JSON, CSV