Subreddits and users
Acquire one community or one account: the torrent-first sub path, the API user path, the date and type filters, and the info subcommands.
For a single subreddit or a single account, the full-history bulk torrents are the wrong tool: you would download terabytes to keep a sliver. The entity path acquires just one community or one person and imports it to the same Parquet store the bulk path uses.
A community: arctic sub
arctic sub golang
sub pulls a community's full history. It is torrent-first: arctic checks the
per-subreddit torrent bundle, and if the community is in it, downloads that file
directly. If the community is not in the bundle, or the catalog check fails, it
falls back to streaming the records from the Arctic Shift API, page by page, at
the rate the service asks for. It prints which path it took on stderr, so you can
see whether r/golang came from a torrent or the API.
The argument is forgiving about prefixes: golang, r/golang, and /r/golang
all resolve to the same community.
Force the API
To skip the torrent check and go straight to the API, pass --api. This is the
path to use for a small or new community that is unlikely to be in the bundle, or
when you want the API's exact date bounds rather than a whole torrent file:
arctic sub golang --api --after 2024-01-01
Type and date filters
Narrow what you pull with --kind and a date window. --kind takes comments,
submissions, or both (the default). --after and --before take a date as
YYYY, YYYY-MM, or YYYY-MM-DD:
arctic sub golang --kind submissions
arctic sub golang --after 2020-01-01 --before 2021-01-01
arctic sub golang --kind comments --after 2024
The date bounds apply on the API path, which serves an exact range; the torrent path brings the whole community file and you filter at query time.
Download without importing
--no-import downloads the data but skips the Parquet conversion, which is
useful when you want the raw .zst or JSONL to feed something else:
arctic sub golang --no-import
An account: arctic user
arctic user spez
user pulls one account's full history. There is no per-account torrent, so this
always goes through the Arctic Shift API. It takes the same --kind, --after,
--before, and --no-import flags as sub, and the same forgiving argument:
spez, u/spez, and /user/spez all resolve to the same account.
arctic user spez --kind comments --after 2023-01-01
What you hold: the info subcommands
Both commands carry an info subcommand that reports what is imported locally for
one entity:
arctic sub info golang
arctic user info spez
Each prints one row per record type with the shard count, row count, byte size, and the first and last dates covered. The counts come from the Parquet footers, so it is a metadata read and does not scan the row data. If nothing is imported for that entity, it exits with code 3 (no data).
Query what you acquired
Once a community or account is imported, read it back with query. A u/ prefix
or --user tells query to read the argument as an account; everything else is
read as a subreddit:
arctic query golang --contains generics
arctic query spez --user --kind comments -n 50
See the querying guide for every filter.
Where it all lands
Per-entity imports live under the data directory, keyed by kind and name, with
the index recording each one. Point the data directory elsewhere with
--data-dir or ARCTIC_DATA_DIR. The API path sends a default User-Agent you
can override with --user-agent, and honors --timeout for each request. See
configuration.