discrawl mirrors Discord guild data into local SQLite so you can search, inspect, and query server history without depending on Discord search.
It is a bot-token crawler. No user-token hacks. Data stays local.
What It Does
- discovers every guild the configured bot can access
- syncs channels, threads, members, and message history into SQLite
- maintains FTS5 search indexes for fast local text search
- builds an offline member directory from archived profile payloads
- extracts small text-like attachments into the local search index
- records structured user and role mentions for direct querying
- tails Gateway events for live updates, with periodic repair syncs
- exposes read-only SQL for ad hoc analysis
- keeps schema multi-guild ready while preserving a simple single-guild default UX
Add optional turbovec semantic-search scoring via [search.embeddings].vector_backend, while keeping exact cosine as the default backend. Thanks @vincentkoc.
Added the Homebrew install command to the discrawl.sh landing hero and agent docs index, with a one-row desktop layout and copy button.
Update crawlkit through v0.12.0.
Add read-only Cloudflare remote archive scaffolding with [remote] config,
subscribe-cloud, GitHub-backed remote login with OAuth or token-env
bootstrap, remote status, remote archives, and cloud-mode status --json
output that does not open or create a local SQLite database.
Route cloud-mode search and filtered messages reads to Worker named
queries so subscribers can inspect live D1 data without local SQLite.
Add discrawl cloud publish to export non-DM local SQLite rows into the
Cloudflare remote archive ingest API without changing Git snapshot
publishing.
Mirror the non-DM local SQLite archive into the Worker-backed R2 object store
during discrawl cloud publish, alongside the D1 row ingest used for live
queries.
Compress the sanitized SQLite mirror as a gzip chunk bundle with an explicit
privacy/count manifest before uploading to R2.
Fixes
Kept resumed sync --full backfills from moving channel latest-message checkpoints backward, avoiding duplicate head recrawls on large interrupted channels. Thanks @hannesrudolph.
Made messages --sync fail fast with an omit---sync hint when a live tail process owns the sync lock, while plain messages reads continue without waiting. Thanks @jeanmonet.