factSocial

How it works

Every fact-check is a deterministic, transparent flow: the extension reads the post, sends it to an LLM you've chosen with a carefully tuned system prompt, lets the model call a web-search tool when claims are time-sensitive, then renders a structured verdict under the post. This page walks through what's actually happening at each step, with the full system prompt reproduced verbatim.

At a glance

In plain language, here's what happens between your click and the verdict card:

  1. You click the Fact-check button next to a Threads or Bluesky post.
  2. The extension reads the post text from the page DOM, skipping usernames, timestamps, and counts.
  3. It sends the text to an LLM you've configured — either LM Studio running locally on your machine, or any model on OpenRouter via your API key. A system prompt instructs the LLM to act as a fact-checker.
  4. The LLM may call a web_search tool, run several queries to verify the claims, read the snippets, then compose its answer. The live progress card shows every search and its top results as they happen.
  5. The LLM returns a structured JSON verdict — verdict label, confidence score, summary, per-claim breakdown, sources, and caveats — which the extension renders into a card under the post.

Nothing routes through a server operated by the extension's author. The only network requests are to your chosen LLM provider and your chosen search provider. There is no factSocial backend.

The data flow

One arrow per network hop, deduplicated to the actual destinations:

[ Threads / Bluesky post in your browser ]
                │
                │  click Fact-check
                ▼
[ Content script ]   reads post text from page DOM
                │
                │  port message to background
                ▼
[ Background script ]
                │
                ├──► LM Studio (localhost)   ─── or ───►   OpenRouter (cloud)
                │       /v1/chat/completions               /api/v1/chat/completions
                │              │
                │     model returns tool_call
                │              │
                │              ▼
                │     ┌──► DuckDuckGo  /  Brave Search  /  Tavily
                │     │       (your chosen search provider)
                │     │
                │     └── results returned to model as a tool message
                │              │
                │     model returns next tool_call OR final JSON
                │              │
                │  loop up to 6 turns until model emits final answer
                ▼
[ Verdict card renders inline under the post ]

The full system prompt

This is the exact text the extension prepends to your post on every fact-check, before the current date and timezone get appended. You can replace it with your own prompt in the extension's preferences page; the factory default ships in prompts/system-prompt.js inside the XPI.

You are a careful, neutral fact-checker. Given a single social media post, identify its factual claims and assess them. Be honest about uncertainty. Do NOT invent citations or sources.

You have access to a "web_search" tool. You MUST use it to verify any post that mentions specific events, people's current status, dates, statistics, prices, or breaking news — your training data is months or years old and cannot be trusted for time-sensitive claims. Run several targeted searches (try different phrasings, named entities, and direct quotes from the post) before forming a verdict. The ONLY time you may skip searching is when the post is pure opinion, a joke, or a personal experience with no factual claim.

When you have enough evidence, reply with ONLY a single JSON object matching this exact schema. NO prose. NO chain-of-thought. NO markdown fences. NO commentary before or after. Just the JSON:

{
  "verdict": "True" | "MostlyTrue" | "Mixed" | "Misleading" | "False" | "Unverifiable" | "Opinion",
  "confidence": <integer 0-100>,
  "summary": "<2-3 sentence plain-language assessment of the post overall>",
  "claims": [
    { "claim": "<a specific factual claim from the post>", "assessment": "<short evaluation of that claim, citing sources from search results when relevant>" }
  ],
  "caveats": "<what you could not verify or where reasonable people might disagree>",
  "sources": [
    { "title": "<page title from a search result you actually used>", "url": "<URL from that result>" }
  ]
}

Only include sources you genuinely used (taken from web_search results). If you skipped searching because the post is opinion/joke/personal, return an empty "sources" array.

CONFIDENCE CALIBRATION — apply this rubric strictly. Do not default to 90+. Calibration matters more than sounding authoritative.

  90–100  Multiple (≥2) independent, authoritative sources from web_search results directly confirm or refute the claim — e.g. reputable news outlets, primary documents, official statements, peer-reviewed research. Virtually no plausible alternative interpretation.

  75–89   One strong source confirms/refutes, OR several weaker/secondary sources agree. The core claim is verifiable but secondary details (exact figures, dates, locations) may vary slightly across sources.

  60–74   Partial verification. Some elements of the claim are confirmed by your sources; others are unaddressed. OR you found only a single source that broadly supports the claim.

  40–59   Mixed evidence. Sources contain substantive disagreement, OR your search returned relevant results that don't cleanly resolve the claim. Pair this band with verdicts "Mixed" or "Misleading".

  20–39   Weak evidence. Search returned only tangentially-related results, OR you are relying mostly on training-data recall for a time-sensitive topic. Strongly consider verdict "Unverifiable".

  0–19    You could not verify the claim from search results at all. Verdict should be "Unverifiable" unless the claim is logically self-refuting or rests on a clearly debunked premise.

ANTI-PATTERNS — these suppress confidence even if other criteria are met:
  • Did not call web_search on a time-sensitive claim → cap confidence at 40.
  • Sources are dated more than a year before today's date for a current-events claim → cap at 50.
  • Single source that is itself a social-media post or unverified blog → cap at 50.
  • Claim hinges on a specific number/date and your sources only mention an approximate range → cap at 70.

For "Opinion" verdicts (post contains no factual claim to check), report confidence in your *classification* of the post as opinion. Typical range: 80–95.

If a tool call is appropriate, emit it instead of writing prose. If you are ready to answer, emit ONLY the JSON object.

A short context block is appended to this prompt at request time, containing the current date, time, and timezone. That anchor matters a lot: without it, models reason about "today" relative to their training cutoff, which is often a year or more out of date.

Anatomy of the prompt

The prompt is structured in six sections, each addressing a specific failure mode we observed in earlier versions.

1. Role and tone

The opening line frames the model as a careful, neutral fact-checker. "Careful" and "honest about uncertainty" target overconfidence. "Neutral" suppresses ideological framing. The explicit "Do NOT invent citations" exists because LLMs hallucinate plausible-looking URLs at high frequency when asked for sources without restraint.

2. The web-search mandate

The second paragraph says MUST in capital letters and lists explicit categories of claims that require a search: events, people's current status, dates, statistics, prices, breaking news. This wording emerged from the observation that models often skip the tool when they "feel like they know the answer", which is exactly the case where they're least reliable.

The mandate also tells the model to run multiple searches with different phrasings — single-query answers tend to anchor on whatever the first result happens to be.

3. Strict JSON-only output rule

"Reply with ONLY a single JSON object" is repeated four times: NO prose. NO chain-of-thought. NO markdown fences. NO commentary. This redundancy exists because reasoning models in particular leak their thinking into the visible output. The extension's parser handles fenced JSON, leading prose, and <think> blocks as a fallback, but the sledgehammer wording reduces how often the fallback fires.

4. The confidence calibration rubric

Six bands tied to evidence quality, ordered descending. Each band names what kind of evidence justifies it. The leading instruction "do not default to 90+" counteracts the habit models have of returning 95% confidence as a default neutral value. We expand on the bands in the calibration section.

5. Anti-pattern caps

Four hard caps that fire regardless of which band the rest of the assessment lands in. These target the specific overconfidence patterns that survive the rubric:

The caps are intentionally simple to apply mechanically — no fuzzy judgment required.

6. Opinion-verdict special case

When a post is pure opinion (no factual claim), the model is told to report confidence about its classification of the post as opinion, not the truth of the opinion. Without this carve-out, models would either return 0% confidence ("I can't fact-check an opinion") or invent factual claims to assess.

The verdict schema

Six required fields, enforced via the OpenAI-compatible response_format: json_schema parameter when the provider supports it, and via prompt-only instructions otherwise.

FieldTypeWhat it carries
verdict enum One of: True, MostlyTrue, Mixed, Misleading, False, Unverifiable, Opinion. The visible pill at the top of the result card is colored from this value.
confidence integer 0–100 The model's calibrated certainty in its verdict, governed by the rubric and anti-pattern caps. Rendered as the percentage next to the verdict pill.
summary string A 2–3 sentence plain-language overview of the assessment. Rendered immediately under the verdict pill.
claims array of {claim, assessment} Each factual claim from the post and its individual evaluation. Rendered as a collapsible Claims (N) list. Posts with one claim get one entry; posts with multiple claims get an entry each.
caveats string What the model could not verify, or where reasonable people might disagree. Rendered as the "Caveats:" line near the bottom of the card.
sources array of {title, url} Sources the model actually used. Rendered as clickable domain chips (e.g. nytimes.com) at the bottom of the card. If the model omits this field but actually performed searches, the extension backfills it from the search results that were returned to the model.

The tool-call loop

Most fact-checks involve at least one round-trip with the search provider. Here's what's happening across multiple turns:

TURN 1
  → Extension sends:  system prompt + user post
  ← Model returns:    tool_call("web_search", {query: "..."})

TURN 2
  → Extension sends:  same history + tool result (search snippets)
  ← Model returns:    tool_call("web_search", {query: "..."})   ← maybe more searches
                  OR  final JSON verdict                          ← if confident enough

TURN 3 ... TURN 6
  → Loop continues, capped at 6 turns total to bound time and tokens
  ← When the model returns content (no tool_calls), that's the answer

TURN N (forced final)
  → If we hit the iteration cap, extension drops the tools and re-asks
    with response_format: json_schema enforced. The model is forced to
    commit to whatever it knows.

Two implementation details worth knowing:

Confidence calibration in detail

The single biggest improvement in v0.2.0 vs. earlier versions. Pre-rubric, model verdicts averaged in the 88–95 confidence range regardless of evidence quality. The rubric anchors the bands to specific evidence patterns:

90–100 Strongly verified. Two or more independent, authoritative sources directly confirm or refute the claim. 75–89 Verifiable. One strong source, or multiple weak sources in agreement. Secondary details may vary. 60–74 Partial. Some elements confirmed, others unaddressed; or a single supportive source. 40–59 Mixed. Sources disagree, or relevant results don't cleanly resolve the claim. Pairs with verdicts Mixed or Misleading. 20–39 Weak. Tangentially related results only, or training-data recall on a time-sensitive topic. Pushes toward Unverifiable. 0–19 Unverified. The model could not confirm anything from search. Verdict is Unverifiable unless the claim is self-refuting.

Why the rubric matters more than the verdict label. A verdict of "True" with 60% confidence is a much weaker claim than "True" with 95%. The confidence number is what tells you whether to trust the result, share it, or go look at the sources yourself. Reading only the verdict label loses most of the signal.

The anti-pattern caps

Even if the rest of the rubric points at, say, 85% confidence, these four checks can cap the score lower. They run regardless of band:

Tweak it yourself

The system prompt above is the factory default. You can replace it with your own in the extension's preferences:

  1. Open about:addons in Firefox.
  2. Find factSocial → click the menu → Preferences.
  3. Scroll to System prompt and edit. Click Save.
  4. Click Reset to defaults at any time to restore the shipped version.

Common reasons to edit the prompt:

Validating prompt changes. Test on the same five posts before and after, with the same model. If verdicts shift in unexpected directions, the rubric is doing something different than you intended. Settings → Reset to defaults always gets you back to known-good.