Automate Website Archiving with a Screenshot API

Published February 22, 2026 · 10 min read

The Wayback Machine is great, but it doesn't always capture what you need, when you need it. If you're tracking competitor landing pages, preserving evidence for legal compliance, or just keeping a visual history of your own site, you need something you control.

In this guide, we'll build an automated website archiving system that captures full-page screenshots on a schedule. You'll get a personal visual archive with timestamped snapshots, organized by domain, ready for comparison or compliance review.

Why Visual Archives Matter

Text-based archives (HTML dumps, WARC files) are useful, but they miss the point. Websites are visual. A CSS change, a swapped hero image, or a removed pricing tier won't show up in a text diff. But in a screenshot? Instantly obvious.

Common use cases for visual website archiving:

The Architecture

Our archiving system is simple: a list of URLs, a cron schedule, and a screenshot API to do the heavy lifting. Screenshots get saved to organized folders with timestamps in the filename.

archive/
├── example-com/
│   ├── 2026-02-22T08-00-00.png
│   ├── 2026-02-22T20-00-00.png
│   └── 2026-02-23T08-00-00.png
├── competitor-io/
│   ├── 2026-02-22T08-00-00.png
│   └── ...
└── manifest.json

Quick Start: One-Off Archive Capture

Before building the full system, let's capture a single archived screenshot with curl:

# Capture a full-page screenshot and save with timestamp
curl "https://grabshot.dev/api/screenshot?url=https://example.com&fullPage=true&width=1440" \
  -H "X-API-Key: YOUR_API_KEY" \
  -o "example-com_$(date +%Y-%m-%dT%H-%M-%S).png"

The fullPage=true parameter captures the entire page, not just the viewport. This is critical for archiving since you want the complete content, not a cropped view.

Node.js: Full Archiving Script

Here's a complete Node.js script that archives multiple URLs, organizes them by domain, and maintains a manifest file for easy lookup:

const fs = require('fs');
const path = require('path');
const https = require('https');

const API_KEY = process.env.GRABSHOT_API_KEY;
const BASE_URL = 'https://grabshot.dev/api/screenshot';
const ARCHIVE_DIR = './archive';

// URLs to archive
const targets = [
  { url: 'https://competitor.com', label: 'competitor-homepage' },
  { url: 'https://competitor.com/pricing', label: 'competitor-pricing' },
  { url: 'https://your-site.com/terms', label: 'own-terms-of-service' },
  { url: 'https://partner.com/our-listing', label: 'partner-listing' },
];

function slugify(url) {
  return new URL(url).hostname.replace(/\./g, '-');
}

function timestamp() {
  return new Date().toISOString().replace(/[:.]/g, '-').slice(0, 19);
}

async function captureScreenshot(url) {
  const params = new URLSearchParams({
    url,
    fullPage: 'true',
    width: '1440',
    format: 'png',
  });

  return new Promise((resolve, reject) => {
    const req = https.get(`${BASE_URL}?${params}`, {
      headers: { 'X-API-Key': API_KEY },
    }, (res) => {
      const chunks = [];
      res.on('data', (chunk) => chunks.push(chunk));
      res.on('end', () => resolve(Buffer.concat(chunks)));
      res.on('error', reject);
    });
    req.on('error', reject);
  });
}

async function archiveAll() {
  const manifest = [];
  const ts = timestamp();

  for (const target of targets) {
    const slug = slugify(target.url);
    const dir = path.join(ARCHIVE_DIR, slug);
    fs.mkdirSync(dir, { recursive: true });

    try {
      console.log(`Capturing: ${target.url}`);
      const screenshot = await captureScreenshot(target.url);
      const filename = `${ts}_${target.label}.png`;
      const filepath = path.join(dir, filename);

      fs.writeFileSync(filepath, screenshot);
      console.log(`  Saved: ${filepath} (${(screenshot.length / 1024).toFixed(0)} KB)`);

      manifest.push({
        url: target.url,
        label: target.label,
        captured: new Date().toISOString(),
        file: filepath,
        size: screenshot.length,
      });
    } catch (err) {
      console.error(`  Failed: ${target.url} - ${err.message}`);
    }

    // Be polite: wait 2 seconds between captures
    await new Promise((r) => setTimeout(r, 2000));
  }

  // Append to manifest
  const manifestPath = path.join(ARCHIVE_DIR, 'manifest.json');
  let existing = [];
  if (fs.existsSync(manifestPath)) {
    existing = JSON.parse(fs.readFileSync(manifestPath, 'utf8'));
  }
  existing.push(...manifest);
  fs.writeFileSync(manifestPath, JSON.stringify(existing, null, 2));

  console.log(`\nArchived ${manifest.length}/${targets.length} URLs.`);
}

archiveAll();

Run it with: GRABSHOT_API_KEY=your_key node archive.js

Python: Archive with Comparison

This Python version goes a step further: it compares each new screenshot against the previous one and flags visual changes. Useful for detecting when a competitor updates their pricing or a partner removes your branding.

import os
import json
import hashlib
import requests
from datetime import datetime
from pathlib import Path

API_KEY = os.environ["GRABSHOT_API_KEY"]
ARCHIVE_DIR = Path("./archive")
API_URL = "https://grabshot.dev/api/screenshot"

TARGETS = [
    {"url": "https://competitor.com/pricing", "label": "competitor-pricing"},
    {"url": "https://your-site.com/terms", "label": "terms-of-service"},
]


def capture(url: str) -> bytes:
    resp = requests.get(API_URL, params={
        "url": url,
        "fullPage": "true",
        "width": "1440",
        "format": "png",
    }, headers={"X-API-Key": API_KEY})
    resp.raise_for_status()
    return resp.content


def file_hash(data: bytes) -> str:
    return hashlib.sha256(data).hexdigest()


def get_previous(directory: Path) -> bytes | None:
    files = sorted(directory.glob("*.png"))
    if not files:
        return None
    return files[-1].read_bytes()


def archive():
    ts = datetime.now().strftime("%Y-%m-%dT%H-%M-%S")
    changes = []

    for target in TARGETS:
        slug = target["url"].split("//")[1].replace("/", "_").replace(".", "-")
        dest = ARCHIVE_DIR / slug
        dest.mkdir(parents=True, exist_ok=True)

        print(f"Capturing: {target['url']}")
        screenshot = capture(target["url"])
        new_hash = file_hash(screenshot)

        # Compare with previous capture
        previous = get_previous(dest)
        changed = True
        if previous and file_hash(previous) == new_hash:
            changed = False
            print(f"  No visual change detected.")

        # Always save (even if unchanged, for timeline completeness)
        filename = f"{ts}_{target['label']}.png"
        filepath = dest / filename
        filepath.write_bytes(screenshot)
        print(f"  Saved: {filepath} ({len(screenshot) // 1024} KB)")

        if changed and previous:
            changes.append(target["url"])
            print(f"  ⚠️  CHANGE DETECTED vs previous capture!")

    if changes:
        print(f"\n🔔 {len(changes)} page(s) changed:")
        for url in changes:
            print(f"   - {url}")


if __name__ == "__main__":
    archive()

Scheduling with Cron

The real power comes from automation. Set up a cron job to run your archiving script at regular intervals:

# Archive every 12 hours
0 */12 * * * cd /home/user/web-archive && GRABSHOT_API_KEY=your_key node archive.js >> archive.log 2>&1

# Archive daily at midnight
0 0 * * * cd /home/user/web-archive && GRABSHOT_API_KEY=your_key python3 archive.py >> archive.log 2>&1

# Archive competitor pricing every 6 hours
0 */6 * * * cd /home/user/web-archive && GRABSHOT_API_KEY=your_key node archive.js --target=pricing >> archive.log 2>&1

For more complex scheduling needs (like different frequencies for different URLs), you can use the GrabShot API docs to set up webhook callbacks that trigger additional processing when a capture completes.

Storage and Retention

Screenshots add up. A full-page capture of a typical website is 500 KB to 2 MB. Here's a quick estimate:

URLsFrequencyDaily StorageMonthly Storage
10Every 12h~20 MB~600 MB
10Every 6h~40 MB~1.2 GB
50Daily~50 MB~1.5 GB
100Every 12h~200 MB~6 GB

A simple retention policy keeps things manageable. Keep daily snapshots for 30 days, weekly for 6 months, and monthly indefinitely:

# cleanup.sh - Run weekly
# Keep daily snapshots for 30 days
find ./archive -name "*.png" -mtime +30 -delete

# Or be smarter: keep first capture of each week after 30 days
# (implement in your archiving script with manifest.json filtering)

Advanced: Building a Visual Diff Viewer

Once you have timestamped screenshots, you can build a simple viewer to compare them side by side. Pair this with GrabShot's DiffShot tool for pixel-level visual regression comparisons.

For automated alerting, extend the Python script above to send notifications (Slack, email, webhook) whenever a change is detected. The hash comparison catches all visual changes, including subtle ones like font rendering differences or ad content swaps.

Start Archiving Websites Today

GrabShot's screenshot API gives you full-page captures, custom viewports, and reliable rendering. 25 free screenshots per month to get started.

Try It Free →

Tips for Reliable Archiving

  1. Use full-page captures - Always set fullPage=true so you don't miss content below the fold.
  2. Set a consistent viewport - Use width=1440 for desktop archives. Capture mobile separately at width=375 if needed.
  3. Add delays for dynamic content - Some pages load content lazily. Use the delay parameter (e.g., delay=3000) to wait for JavaScript to finish.
  4. Handle failures gracefully - Sites go down. Your archiver should log failures and retry, not crash.
  5. Keep a manifest - The JSON manifest makes it easy to search, filter, and build UIs on top of your archive.
  6. Version your archiver - Put your script in git. When you change capture settings, you'll want to know when and why.

Legal Considerations

Archiving public websites for internal review, competitive analysis, or compliance is generally acceptable. However:

Wrapping Up

Building a visual website archive takes about 30 minutes to set up and runs on autopilot from there. Whether you're tracking competitors, maintaining compliance records, or just want a visual history of the web, a screenshot API makes it trivially easy.

The code examples above work out of the box. Clone them, swap in your URLs and API key, set up a cron job, and you've got your own personal Wayback Machine that captures exactly what you need, when you need it.