Bias Detector Tool
Dual-Layer Bias Intelligence. An advanced NLP agent that goes beyond simple keyword filtering by combining a strict Sentiment Mapping Registry with a contextual Implicit Bias Framework. This system detects both explicit policy violations (slurs, hate speech) and subtle, systemic issues like attribute bias, double standards, and institutional gatekeeping—providing actionable, educational feedback for safer, more inclusive communication.
Stack: Gemini 3.0 Pro via Google Opal | Dual-Layer Reasoning (System 1 + System 2) | Chain-of-Thought Prompting | Contextual Data Retrieval | HTML/CSS Generation
The Bias Detection Agentic AI Apps
Who is this for?
This is for anyone who wants to check for possible bias in a given portion of text or a text document (the shorter, the better). There are two agents with nearly identical functions, though the validated version will normally yield more accurate results at the expense of additional time and model compute. Though you may upload large portions of text (or a full document), Google Gemini will take a very long time to process your request and return the relevant web page. If either agent yields an error, refresh and ask the agent to reprocess the request.
NOTE: Either agent takes its time as it processes the provided text, applying the provided bias framework and sentiment mappings to identify both explicit and implicit biases.
Persona 1 - The Open-Minded Visitor: Original | Validated
The Scenario: A visitor sees the "Bias Detector" tool, but isn't sure what it does or why they should trust it.
Prompt: "I wanted to flag some concerns about the new engineer, Marcus. While he’s certainly articulate for someone from his background, I’m worried he’s not a culture fit for our high-performance team.
Yesterday, he was being aggressive during the standup meeting when he questioned the legacy code. It felt like he was going off the reservation a bit with his ideas. We need rockstars who are ready to work hard and play hard, not people who are sensitive or low on the totem pole.
Also, the ladies in HR sent over a few more resumes, but most of them seem like diversity hires. One guy looks good on paper, but he’s wheel-chair bound, so I’m not sure he’d be able to keep up with the pace. Let’s blacklist the other candidates for now."
Evaluate - Can it demonstrate the "Dual-Layer" architecture (Sentiment Map vs. Bias Framework) with a clear, high-level example?
Persona 2 - The Curious Colleague: Original | Validated
The Scenario: A potential collaborator or HR leader wants to verify if this tool understands subtle workplace dynamics, specifically around gender bias and double standards.
Prompt: "Sarah has been a helpful support for the team. She is very organized and keeps everyone on track, though she can be a bit emotional when deadlines are tight. In contrast, David is a natural leader who dominates the room and drives hard for results, even if he rubs some people the wrong way."
Evaluate - Can it use the "Flip Test" logic to identify Attribute Bias without relying on explicit slurs?
Persona 3 - The Technical Critic: Original | Validated
The Scenario: A tech lead or recruiter is seeking evidence of this system's ability to identify structural and cultural issues, such as ageism in tech hiring.
Prompt: "We are looking for a digital native to join our high-energy tribe. You should be a rockstar developer with a work-hard-play-hard attitude. We need someone fresh who isn't set in their ways and can crush code targets."
Evaluate - Does it correctly identify "Gatekeeping" and "Institutional Barriers" (Section 2.3), distinct from simple name-calling?
Persona 4 - The Senior Data Scientist: Original | Validated
The Scenario: A Senior Data Scientist wants to stress-test this system's ability to handle coded language and proxy attributes (e.g., race/class signals).
Prompt: "A group of urban youth were loitering near the entrance. They looked suspicious and were loud. Security asked the gang to disperse, but they became aggressive. We need to police this area better to keep the environment clean."
Evaluate - Does it ground its analysis in specific Framework sections (Section 4.2 Coded Language) to explain why "urban" is problematic here?
Persona 5 - The Passionate Hiring Manager: Original | Validated
The Scenario: A hiring manager for an accessible design role wants to see whether this tool advocates inclusive design principles and flags ableism.
Prompt: "This app is designed for normal users who want to walk through the city and see the sights. We need a sanity check on the color scheme, but let's not get too bogged down in edge cases like screen readers right now—we can worry about the blind spots later."
Evaluate – Does it identify the specific "Default Bias" (Section 3) and offer a corrective alternative?
Persona 6 - The Skeptical QA Engineer: Original | Validated
The Scenario: A QA engineer or skeptic wants to ensure the tool isn't "hallucinating" bias in perfectly safe, factual text.
Prompt: "The quarterly review shows a 15% drop in revenue. The marketing team missed their targets, and the engineering lead has requested more resources to fix technical debt. We need a meeting to discuss the roadmap."
Evaluate - Does it correctly return "No Bias Detected" and avoid over-flagging neutral operational language?
Methodology: Dual-Layer Reasoning Architecture
The Bias Detector utilizes a hybrid cognitive architecture that prevents the common pitfalls of standard AI moderation. Instead of relying on a single pass, the system splits analysis into two distinct processing layers:
System 1 (Fast Path): Sentiment Mapping Registry. The input is first scanned against a deterministic registry of known hate speech, slurs, and coded language (derived from the DGHS and MLMA datasets). This ensures immediate, hallucination-free flagging of explicit policy violations.
System 2 (Slow Path): Counterfactual Inference Engine. For text that passes the explicit filter, the system engages a chain-of-thought "Flip Test." It mentally substitutes demographics (e.g., swapping "she" for "he") to detect tonal discrepancies, attribute bias, and double standards that keyword filters miss.
Contextual Grounding: All findings are validated against a custom Implicit Bias Framework (Section 2.3) to distinguish between harmless comments and institutional gatekeeping.
Technical Architecture: The Counterfactual Inference Engine
Detecting implicit bias requires more than just a list of banned words. It demands a system capable of semantic reasoning and counterfactual analysis. This tool implements a Dual-Process Cognitive Architecture to audit text for fairness and inclusion:
The Solution: This agent implements a Two-Stage Filtering Protocol, ensuring that every analysis is both fast (for obvious hate speech) and deep (for subtle discrimination):
Stage 1: Deterministic Sentiment Mapping (System 1):
Mechanism: Scans input against a curated High-Velocity Toxicity Registry derived from the Convabuse, MLMA, and DGHS datasets.
Purpose: Provides immediate, zero-latency flagging of slurs, hate symbols, and known dog-whistles.
Stage 2: Counterfactual "Flip Test" (System 2):
Mechanism: Engages a Chain-of-Thought (CoT) reasoning loop that mentally swaps demographic markers (e.g., changing "she" to "he"). It then measures the Semantic Drift in the descriptors used.
Purpose: Detects Attribute Bias (e.g., praising men for "leadership" vs. women for "support") and Institutional Gatekeeping (e.g., "culture fit" as a proxy for exclusion).
Stage 3: Contextual Grounding:
Mechanism: Validates all findings against a custom Implicit Bias Framework (Section 2.3) to differentiate between neutral operational language and systemic exclusion.
Stage 4: Validation (Optional):
Mechanism: Validates all findings against the user's prompt and furnishes the design to improve accuracy and the user's experience.
Safety & Integrity: To prevent false positives, the architecture includes a Neutrality Guardrail. If the semantic drift during the "Flip Test" is negligible (e.g., a factual business report), the system defaults to a "No Bias Detected" state, ensuring it does not hallucinate problems in safe content.
Prompt Guide for High-Value Results
To generate the most accurate and actionable insights from the Bias Detector, use prompts that challenge the system to find both explicit policy violations and subtle institutional barriers:
The "Double Standard" Prompt: "Analyze this performance review for a female employee described as 'emotional' versus a male employee described as 'passionate.' Does the system detect the gendered attribute bias? [add performance review text]"
The "Gatekeeping Audit" Prompt: "Review this job description for a 'digital native' role. Identify any coded language or proxy attributes that might create institutional ageism or exclude protected groups. [add job description text]"
The "Hidden Toxicity" Prompt: "Scan this incident report about 'urban youth' for coded racism and dehumanizing metaphors. Explain how the language might enforce systemic bias even without explicit slurs. [add incident report text]"
Explainer
Technical Insight: This interactive agent represents the second phase of my research-to-product pipeline, built directly on the findings of my Capstone, "Analyzing Hate Speech."
Leveraging Google’s experimental Opal platform, this project demonstrates a Multi-Agent Workflow that orchestrates specialized reasoning loops (System 1 + System 2). While the initial research focused on training SBERT and Random Forest models to classify high-velocity toxicity in political datasets, this implementation operationalizes those insights into a resilient, enterprise-grade Bias Intelligence System. It proves that cutting-edge, experimental agentic frameworks can be engineered to solve real-world problems like institutional bias and exclusionary language.
CFornesa
I am the Multidisciplinary Web Architect.
Inquiry
Copyright © 2025. Chris Fornesa. All rights reserved. Here's my privacy policy.
