Skip to content
arctic

CLI

Every command and subcommand, with the flags that matter and one example each.

arctic <command> [args] [flags]

Run arctic <command> --help for the full flag list on any command. This page is the map. The global flags at the end apply to every command.

Commands

Command What it does
pull Download monthly bulk dumps from the torrent catalog
sub A community's history, torrent first then API
sub info What is imported locally for one subreddit
user An account's history through the Arctic Shift API
user info What is imported locally for one account
process Convert decompressed JSONL dumps into Parquet
query Filter imported records from one entity
catalog List the months the bulk torrent catalog covers
stats Summarize the local index of imports
publish Convert a month range and upload to Hugging Face
info Report detected hardware, the work budget, and storage paths
version Print version, commit, and build date

pull

arctic pull [months...] [flags]

Downloads the monthly RC_/RS_ dumps from the public torrent catalog. Name months as arguments: a single month, a list, or a 2024-01..2024-03 range. With no arguments it uses --from and --to. With --process each completed file is converted to Parquet as it lands.

Flag Default Meaning
--from catalog start First month (YYYY-MM) when no months are named
--to catalog end Last month (YYYY-MM) when no months are named
--type both comments, submissions, or both
--process off Convert each completed file to Parquet
arctic pull 2024-01..2024-03 --type comments --process

sub

arctic sub <subreddit> [flags]

Pulls a community's full history. Torrent-first: it downloads the per-subreddit bundle file when the community is in it, and falls back to the Arctic Shift API otherwise. --api forces the API path. The argument accepts golang, r/golang, or /r/golang.

Flag Default Meaning
--api off Force the Arctic Shift API path
--kind both comments, submissions, or both
--after none Earliest date (YYYY, YYYY-MM, or YYYY-MM-DD)
--before none Latest date
--no-import off Download only, skip the Parquet import
arctic sub golang --kind submissions --after 2020-01-01

sub info

arctic sub info <name>

Reports what is imported locally for one subreddit: shard count, row count, byte size, and the date span, per type. Exits 3 if nothing is imported.

arctic sub info golang

user

arctic user <username> [flags]

Pulls one account's full history through the Arctic Shift API (there is no per-account torrent). Takes the same --kind, --after, --before, and --no-import as sub. The argument accepts spez, u/spez, or /user/spez.

arctic user spez --kind comments --after 2023-01-01

user info

arctic user info <name>

Reports what is imported locally for one account, the same shape as sub info.

arctic user info spez

process

arctic process <file.zst>... [flags]

Converts one or more decompressed JSONL dumps into Parquet. It infers the record type from the file name: an RC_ or _comments name is comments, an RS_ or _submissions name is submissions. A file it cannot classify is skipped with a note.

Flag Default Meaning
--out alongside the input Output directory for the Parquet shards
arctic process RC_2024-01.zst RS_2024-01.zst --out ./parquet/2024-01

query

arctic query <subreddit|user> [flags]

Scans the Parquet shards of an entity you have already imported and filters them. A u/ prefix or --user reads the argument as an account; everything else is a subreddit.

Flag Default Meaning
--author none Filter by author
--contains none Case-insensitive substring in body/title/selftext
--min-score 0 Minimum score
--after none Earliest date (YYYY, YYYY-MM, or YYYY-MM-DD)
--before none Latest date
--kind both comments, submissions, or both
--user off Read the argument as a username
arctic query golang --contains generics --min-score 100 -n 20

catalog

arctic catalog [flags]

Lists the months the bulk torrent catalog covers, with a flag for whether each month is in the single large bundle. With --sizes it fetches the per-file byte sizes from the catalog over the network.

Flag Default Meaning
--sizes off Fetch per-file sizes from the catalog
arctic catalog --sizes

stats

arctic stats [flags]

Summarizes the local index of imported shards without rescanning the Parquet.

Flag Default Meaning
--by month Group by month, type, or subreddit
arctic stats --by type

publish

arctic publish [flags]

Processes a month range into Parquet and uploads it to a Hugging Face dataset. Reads the token from HF_TOKEN. Without --commit it runs the full pipeline but skips the upload. A stalled commit exits 75 so a supervisor can restart and resume from the stats.csv ledger.

Flag Default Meaning
--from catalog start First month (YYYY-MM)
--to catalog end Last month (YYYY-MM)
--type both comments, submissions, or both
--repo default dataset Hugging Face dataset repo
--commit off Upload (default is a dry run)
--private off Create the dataset repo as private
--keep off Keep local Parquet after a successful commit
arctic publish --from 2024-01 --to 2024-03 --commit

info

arctic info

Reports the detected hardware, the work budget arctic derives from it, the active engine and whether DuckDB is available, and the resolved storage paths.

arctic info

version

arctic version

Prints the version, commit, and build date.

arctic version

Global flags

These apply to every command. See configuration for the defaults and the environment.

Flag Meaning
-o, --output table, json, jsonl, csv, tsv, url, raw (auto = table on a TTY, jsonl piped)
--fields Comma-separated columns to include
--no-header Omit the header row in table/csv/tsv
--template Go text/template applied per record
--color auto, always, or never
-n, --limit Maximum records (0 means no limit)
-q, --quiet Suppress progress on stderr
-j, --workers Concurrent workers (0 uses the hardware budget)
--data-dir Root for per-entity imports and the index (env ARCTIC_DATA_DIR)
--raw-dir Where downloaded .zst files land
--work-dir Scratch for conversion
--engine Conversion engine: go or duckdb
--chunk-lines JSONL lines per Parquet shard
--timeout Per-request timeout for the API path
--user-agent Override the API request User-Agent