Detection Methodology
Last updated: March 22, 2026
Overview
Baloney is a multi-modal AI content detection platform that uses cascading detection pipelines to classify content as human-created or AI-generated. Each pipeline combines multiple detection methods, from cryptographic watermark verification to commercial classifiers to local forensic analysis, running in priority order with early exit on high-confidence results. This document explains how detection works, what the data means, and what the limitations are.
Data Collection Method
Content reaches Baloney for analysis through three channels:
- Browser extension: Users right-click or select content on supported websites to initiate a scan. Detection is always user-initiated.
- Mobile app: Users choose content to analyze from their device. Scans are explicitly triggered, never automatic.
- API integrations: Third-party tools (WordPress plugin, SDK) submit content on behalf of their users for detection.
Users explicitly choose content to analyze. There is no automated crawling or scraping. Only cryptographic hashes (HMAC-SHA256) of content are retained; raw content is processed in memory and discarded after analysis completes.
Detection Methods
Text Detection Pipeline
Text analysis uses a cascading pipeline that exits early when a high-confidence result is found at any stage:
- Stage 1: SynthID watermark detection (Google). Detects Gemini-generated text via embedded statistical watermarks. When present, watermarks provide near-certain identification of AI origin.
- Stage 2: Pangram API. Commercial AI text classifier with 99.85% accuracy (peer-reviewed: arXiv:2402.14873). Trained on a wide range of language models and writing styles.
- Stage 3: Statistical analysis. A 17-feature ensemble including burstiness, type-token ratio, perplexity, transition phrase density, hedging score, bigram entropy, and part-of-speech pattern analysis. These features capture stylistic regularities common in AI-generated text.
Confidence cap: Local-only statistical methods cannot trigger the highest confidence tier. This prevents false positives from weaker methods when commercial APIs are unavailable.
Image Detection Pipeline
Image analysis runs a cascading pipeline across multiple detection methods:
- Stage 0: C2PA Content Credentials. Verifies provenance metadata from tools like Adobe Firefly that embed cryptographic signatures attesting to content origin.
- Stage 1: SynthID image watermark (Google). Detects images generated by Google Imagen models via embedded imperceptible watermarks.
- Stage 2: SightEngine and Hive AI. SightEngine (98.3% accuracy, ARIA benchmark #1) and Hive AI (fallback) are commercial image classifiers. Hive identifies specific generators including DALL-E, Midjourney, Stable Diffusion, and others.
- Stage 3: Local forensic analysis. FFT/DCT frequency analysis, Error Level Analysis (ELA), noise pattern analysis, and EXIF metadata inspection. These methods detect artifacts typical of AI-generated images that differ from camera-captured photographs.
Consensus guardrail: Local-only methods require at least 3 of 4 forensic signals to agree before producing an AI-leaning score. This reduces false positives when commercial classifiers are not available.
Video Detection
- SightEngine native video analysis with per-frame scoring provides the primary detection signal.
- Hive video detection serves as a fallback classifier.
- Frame-by-frame image analysis is used as a final fallback, applying the image detection pipeline to sampled frames from the video.
Confidence Scoring
- Each detection method returns a score from 0 (human) to 1 (AI-generated), along with a weight reflecting its reliability.
- Methods are weighted by tier: watermark detection carries the highest weight, followed by commercial API classifiers, then local forensic analysis.
- Confidence caps prevent lower-tier methods from triggering high-confidence verdicts, ensuring that strong conclusions require strong evidence.
- The weighted aggregate score is mapped to a final verdict: likely_human (low scores), uncertain, possibly_ai, likely_ai, and ai_generated (high scores).
Provenance Tracking
- Content is identified by its HMAC-SHA256 hash. The same content always produces the same hash, enabling cross-platform tracking without storing the original content.
- Each observation is recorded as a “sighting” with the platform where it was found, a timestamp, and the detection results from that scan.
- Multiple observations of the same content across different users and platforms aggregate into a compound verdict, providing a more robust assessment than any single scan.
- Compound verdicts require a minimum of 3 observations before resolving. This prevents premature judgment based on limited data.
- Near-duplicate matching via perceptual hashing identifies slightly modified versions of the same content. Images use DCT-based pHash (perceptual hash), while text uses SimHash. This allows detection of content that has been cropped, re-encoded, or lightly edited.
Sampling Limitations
Observatory data reflects what Baloney users choose to scan, not a representative sample of all online content. The following biases should be considered when interpreting aggregate statistics:
- Selection bias: Content is analyzed only when users choose to scan it. Users are more likely to scan content they already suspect is AI-generated, which inflates observed AI rates above the true baseline.
- Platform coverage: Coverage depends on where users are active. Popular platforms (X, Instagram) have more observations than niche platforms, and smaller platforms may not have enough data to produce reliable statistics.
- Language bias: Detection methods are primarily calibrated on English-language content. Accuracy may vary for other languages, and non-English content is underrepresented in the dataset.
- Temporal bias: Detection accuracy may degrade over time as AI generation models improve. Method version tracking enables future re-evaluation of historical results.
- Demographic bias: Observations reflect the demographics of the user base, not the general population. Conclusions about AI content prevalence should be qualified accordingly.
Known Limitations
- AI text detection is probabilistic. No method achieves 100% accuracy. Both false positives (human content flagged as AI) and false negatives (AI content missed) are possible.
- Edited AI content (AI-generated then human-edited) is harder to detect. The more extensively content has been edited, the less likely detection methods are to identify its AI origin.
- Short texts (under 200 characters) have lower detection confidence due to insufficient statistical signal.
- Local-only methods (without commercial API access) have limited accuracy compared to the full pipeline. Confidence caps reflect this limitation.
- Content generated by newer AI models may evade current detection methods until classifiers are updated.
- Platform-specific formatting (e.g., tweets vs. long-form articles) affects detection performance. Very short, highly structured formats provide less signal for analysis.
Data Products
- Observatory API: Provides aggregate metrics per platform, including AI content rates, trends over time, generator distribution, and C2PA compliance rates.
- All published data is aggregated. No individual scans, posts, or authors are identifiable in Observatory outputs.
- Daily and weekly snapshots provide temporal resolution for tracking how AI content prevalence changes over time.
Questions
For questions about our detection methodology or to report a detection issue, contact us at support@baloney.app.