How we Cleaned 470 WordPress Posts With AI

November 7, 2025
By Kevin Gilleard
Featured image for “How we Cleaned 470 WordPress Posts With AI”

Read the non-techie version here

TL;DR — We built an n8n workflow that pulled ~470 posts (15 years of content) from a WordPress site, ran each post through a chain of focused AI sub-agents (HTML cleanup → headings → URL normalization → list fixes → image alt text → typo pass), logged every action to a database, and pushed cleaned HTML back to WordPress. I also built a small JS review app (with Windsurf) to diff original vs AI-edited content so a human can accept/reject changes. Accuracy landed around 85–90% on first pass. Next, I’m gating the “update in WP” step until a human approves. Below is what it took, how it works, and how you can adapt it.

Why bother?

Old posts accumulate junk: inline MSO markup, malformed lists, random <span> wrappers, heading chaos, and hot-linked images. Multiply that by hundreds of posts and you’ve got a slow, fragile site that’s harder to maintain and rank.

The goal: preserve meaning and rendering while getting semantic, standards-compliant, accessible HTML into WordPress—without wasting weeks of manual cleanup.

The architecture at a glance

n8n workflow example

Pipeline:

  1. Fetcher – Get posts (REST or DB query) into n8n in batches.
  2. Master Agent (system prompt orchestrator) – Calls sub-agents in a strict order.
  3. HTML Cache – Each sub-agent reads/writes via cache (no giant payloads passed around).
  4. Validator & Diff – Make sure content is visually equivalent and structurally valid.
  5. Audit Log (DB) – Every decision written to Postgres.
  6. Updater – Pushes cleaned HTML to WordPress (temporarily direct; next iteration gates this behind human approval).
  7. Reviewer UI (Windsurf app) – Human approves/rejects with a side-by-side diff and notes.

LLM: Grok-4 Fast Reasoning (great speed/quality balance for this workload).

Time to build & run: ~15–18 hours (research, testing, refinement, full batch).

Throughput now: < 1 day for ~1000 posts.

The Master Agent (the brain)

I used a Master Agent prompt that strictly enforces tool sequencing, caching, and “don’t change meaning.” Highlights:

  • Strict order of sub-agents (cleanup → headings → links → lists → alt text → typo pass).
  • Cache discipline: sub-agents fetch from cache by post_id, write back to cache, and report compliance before next step.
  • Integrity & safety: preserve shortcodes, embeds, classes, ARIA, attributes, and internal links; convert <h1> → <h2>; remove bold inside headings; convert faux lists; normalize punctuation/typography; pretty-print HTML; run a final sanity pass.
All the sub-agents that process the HTML.

Sub-agents (what each one does)

AgentWhat it fixesNotes
HTML General CleanupMS Word junk, empty <span>, invalid tags, nonstandard attributesLeaves layout classes alone
Heading Hierarchy & PunctuationPromotes <h1>→<h2>, fixes nesting & punctuation, strips <b> inside headingsCritical for accessibility/SEO
URL & Link StandardizerNormalizes internal links (relative), validates externalsOptional per post
Faux List → HTMLTurns dashed/numbered paragraphs into <ul>/<ol> + <li>Preserves original order
Image Alt Text GeneratorAdds/normalizes alt from contextNo change to src, width/height
Typo & Text NormalizationLight grammar/typo fixes onlyNo rewriting or summarizing

The n8n workflow details

The main workflow

  • Trigger: “When chat message received” or manual run for batch sets.
  • Loop Over Items: iterates posts; skips if “already processed.”
  • Cache nodes: Cache Initial Post HTML and Update Cache with Processed HTML Output are called by each sub-agent; the Master Agent passes only post_id.
  • Structured Output Parser: ensures sub-agents return a clean CleanHTML + Report object.
  • DB Logging: Each step writes to Postgres (post id, tool name, status, warnings, character deltas).
  • Updater: Calls the WordPress REST API to update the post HTML (temporarily direct; see “What’s next”).
  • Messaging: Progressive feedback to chat (or logs) so I know where a given post is in the pipeline.

The review app (human-in-the-loop)

Human in the loop review app coded with Windsurf.

I built a small web app with JS (Windsurf assisted) that reads the audit DB and shows:

  • Side-by-side diff of original vs AI-edited HTML [Screenshot Placeholder: Diff view]
  • Per-tool notes (e.g., “Heading agent promoted H3→H2 in section 4”)
  • Quick actions: Accept, Reject, or Edit then Accept
  • Queue filters: “needs review,” “accepted,” “rejected,” “rework”

This made it painless to spot the ~10–15% that needed a human touch (nested shortcodes, edge-case tables, weird legacy embeds).

Results after first run

  • Coverage: 470 posts across 15 years
  • Accuracy: ~85–90% “good to ship” with zero edits
  • Common wins: smaller DOM, valid headings, cleaner lists, consistent alt text, safer links
  • Time saved: what would be weeks of manual cleanup collapsed into hours
The LLM caught the correctly spelled typo and removed <strong> tags correctly.
The LLM added image alt text by downloading and reading the image in the flow
The fake list with bullet characters is replaced with a semantic HTML list via dedicated sub-agent.
The LLM intelligently links email addresses and phone numbers in the text.
A combination of a ton of different fixes the LLM made. Multiple alt tags written, arbitrary formatting removed, hotlinked external image flagged, underlines converted to italics, and punctuation fixes.

What’s next (and how were changing the flow)

Right now the workflow updates WordPress right after the AI pass. I’m switching to review-first:

  1. After sub-agents finish, write the cleaned HTML to the DB (not WP).
  2. Mark the post PENDING_REVIEW.
  3. The reviewer UI changes the status to APPROVED or REJECTED (with optional manual edits captured).
  4. A separate n8n workflow (or a cron-triggered node) only updates WordPress for approved rows.

Implementation sketch (n8n):

  • Add a “Write Draft HTML” node → Postgres table post_edits:
INSERT INTO post_edits (post_id, html_original, html_clean, status, notes)
VALUES ($post_id, $original, $clean, 'PENDING_REVIEW', $report)
ON CONFLICT (post_id) DO UPDATE
  SET html_clean = EXCLUDED.html_clean,
      status = 'PENDING_REVIEW',
      updated_at = NOW();
  • Add a separate “Apply Approved Edits” workflow that:
    • Selects status = ‘APPROVED’
    • Updates WP via REST
    • Writes audit row & sets status = ‘APPLIED’
  • Optional: “Rework Requested” branch that re-queues a single sub-agent (e.g., only rerun “Faux List → HTML” for a post).

Gotchas & practical notes

  • Don’t pass giant HTML strings between nodes—use the cache idiom you see in the prompt. Faster, safer, cheaper.
  • Alt text: keep it descriptive but not fluffy. If the LLM doesn’t have context, prefer alt=”” for decorative images.
  • Headings: enforce one H2 “title” equivalent per document, then H3/H4 for sections. Remove <strong> around headings—screen readers double-announce emphasis and this also gets in the way of template-level styling.
  • Relative internal links: speeds future domain moves and staging workflows while reducing DNS lookups and 301 redirects.
  • Shortcodes & embeds: always preserve; never “clean” them.
  • Final sanity pass: pretty-print, close tags, normalize entities (“—” not &mdash; in copy), and verify no images or content disappeared.

Example: WordPress update snippet (REST)

POST /wp-json/wp/v2/posts/{id}
Authorization: Bearer <token>
Content-Type: application/json

{
  "content": "<!-- cleaned HTML here -->"
}

In n8n, I use the HTTP Request node with OAuth/app password or JWT depending on the site.

Performance & cost

  • Batching: I ran in chunks to avoid long-running single executions.
  • Retries: If a tool returns invalid JSON, n8n retries that node with backoff.
  • Cost: Grok-4 Fast Reasoning was efficient; the cache+sub-agent pattern keeps prompts small.

Where human judgment still wins

  • Posts with weird legacy builders (inline tables, copy-pasted email templates).
  • Edge-case typography in legal or medical posts where exact punctuation matters.
  • Posts with hot-linked media that should be hosted locally.

That’s why the review UI exists. The AI handles the heavy lifting; humans approve the final mile.

Closing

This project proves the obvious (but often ignored) point: AI is a fantastic teammate when you split the work into small, deterministic “hats.” Give each sub-agent a narrow job, enforce order, keep state in a cache, and log everything. Pair that with a quick human review loop and you get the best of both worlds—speed and safety.

If you want the prompts, node JSON, or the little review app scaffold, say the word and I’ll package a starter kit.

Citations & useful references

  1. WordPress Coding Standards (HTML/CSS/JS): https://developer.wordpress.org/coding-standards/wordpress-coding-standards/html/
  2. WordPress REST API Handbook: https://developer.wordpress.org/rest-api/
  3. WHATWG HTML Living Standard: https://html.spec.whatwg.org/
  4. WAI-ARIA 1.2 (accessibility roles, properties, states): https://www.w3.org/TR/wai-aria-1.2/
  5. n8n Documentation: https://docs.n8n.io/
  6. Writing good alt text (W3C): https://www.w3.org/WAI/tutorials/images/
  7. xAI / Grok API (general): https://x.ai/


Share: