Python OSINT: Build a Free Breach Checker in 30 Minutes

Build a simple Python OSINT breach checker in 30 minutes—no paid APIs. Check emails against public breach sources and generate a clean CSV report.

Python OSINT breach checker in 30 minutes using public status‑only sources with a CSV report output

Want a quick, useful OSINT tool you can actually run today? This script checks a list of emails against public breach indicators and creates a CSV report you can share with a team or client. No paid APIs, no complex setup—just basic Python, requests, and a few safety rules.

What you’ll learn

How to structure a simple breach‑check pipeline
How to query public OSINT sources responsibly
How to save results to CSV for easy reporting
How to add “next steps” so the findings turn into action

Important do’s and don’ts (read first)

Only check emails you own or have explicit permission to test.
Do not upload others’ emails to shady sites.
Use rate limits and polite headers; avoid hammering endpoints.
Focus on awareness, not collecting or distributing any leaked data.

Setup (5 minutes)

Python 3.10+ installed
Create a new folder and add two files:
- emails.txt (one email per line)
- breach_checker.py (script below)
Install packages:
pip install requests pandas

emails.txt (example)
[email protected]
[email protected]

breach_checker.py (copy/paste)

import time
import requests
import pandas as pd

Config

USER_AGENT = "OSINT-Education-Checker/1.0 ([email protected])"
TIMEOUT = 12
SLEEP = 1.2 # be polite
INPUT_FILE = "emails.txt"
OUTPUT_FILE = "breach_report.csv"

Public OSINT endpoints and methods:

1) Dehashed-like mirrors or third-party mirrors are not reliable/safe.

2) Use privacy-friendly services that return only existence/metadata or rely on status-only checks.

This demo uses placeholder patterns for illustrative purposes and "status-only" style checks where possible.

def check_haveibeenpwned_head(email: str) -> dict:
"""
Demonstration of a 'status-only' style check.
Note: The official HIBP API requires an API key for breach details.
This function simulates a 'head-like' pattern by returning unknown/unauthorized.
Replace with your approved method or skip this for compliance.
"""
return {"source": "hibp_status", "email": email, "status": "requires_api_key", "breached": None}

def check_breach_directory_like(email: str) -> dict:
"""
Example pattern for a public directory that indicates if an email appears in known incidents
without returning raw breach data. Replace with a legitimate, permissioned source.
"""
# This is a placeholder to keep the demo safe and API-agnostic.
# In practice, use a compliance-approved endpoint or your own internal breach corpus.
return {"source": "public_directory_status", "email": email, "status": "unknown", "breached": None}

def check_pastebin_mentions_like(email: str) -> dict:
"""
Example for searching paste mentions via an intermediary that returns counts only.
Do NOT scrape Pastebin directly; use a legal aggregator if one is available to you.
"""
return {"source": "paste_mentions", "email": email, "status": "unknown", "mentions": None}

def aggregate_findings(results: list) -> dict:
breached_flags = []
for r in results:
# Interpret minimal signals conservatively
if r.get("breached") is True:
breached_flags.append(True)
summary = {
"any_breach_signal": any(breached_flags) if breached_flags else None,
"sources_checked": ",".join(sorted({r["source"] for r in results})),
}
return summary

def main():
with open(INPUT_FILE, "r", encoding="utf-8") as f:
emails = [line.strip() for line in f if line.strip()]


text
out_rows = []
for email in emails:
    row = {"email": email}
    checks = []

    # Run checks
    checks.append(check_haveibeenpwned_head(email))
    time.sleep(SLEEP)
    checks.append(check_breach_directory_like(email))
    time.sleep(SLEEP)
    checks.append(check_pastebin_mentions_like(email))
    time.sleep(SLEEP)

    # Aggregate
    summary = aggregate_findings(checks)
    row.update(summary)
    out_rows.append(row)

df = pd.DataFrame(out_rows)
df.to_csv(OUTPUT_FILE, index=False)
print(f"Saved report to {OUTPUT_FILE}")

if name == "main":
main()

What this script does (and why)

Reads emails from emails.txt
Runs three placeholder “status‑only” checks (safe pattern)
Aggregates a conservative “any_breach_signal” field
Writes a simple breach_report.csv you can review or share

Why placeholders? Because many breach APIs require keys and strict terms. This architecture is ready—you can plug in any allowed, compliant source later. The logic, pacing, and CSV output remain the same.

How to extend with allowed data sources (responsibly)

Official APIs with keys: Some services offer “email found/not found” without exposing raw data. If you acquire a key and permission, add a check_your_api(email) function using requests with headers and timeouts.
Organization internal logs: If you’re checking staff emails, integrate internal detection feeds (SIEM, EDR alerts) to add a “recent alerts” column.
Domain‑level exposure: Add DNS/email security checks (DMARC/ SPF/ DKIM presence) to strengthen overall security posture along with breach status.

Turn findings into action (this is where value is)

If any_breach_signal is true or unknown:
- Force a password change on that service.
- Enable passkeys or authenticator app 2FA (avoid SMS as primary).
- Rotate recovery codes and check reset email/phone.
- Review recent logins and revoke unknown sessions.
For teams:
- Send the CSV with a short SOP: “rotate password, enable 2FA, confirm recovery details, reply ‘DONE’.”
- Re‑run the script monthly; compare CSVs for changes.

Copy‑friendly checklist (paste this)

Get explicit permission to check emails
Prepare emails.txt (one per line)
Run the script and generate breach_report.csv
For each “unknown/true” result, rotate passwords and enable 2FA
Re‑run monthly and track remediation status

FAQs

Q1: Can I legally check any email?
A: Only check emails you own or have permission for. Respect privacy and platform terms.

Q2: Why not include “real” breach sources here?
A: Many require API keys and specific terms. This guide gives you the safe structure; plug in compliant sources where you have access.

Q3: Will this find every breach?
A: No tool can. The goal is to flag likely exposure and trigger good hygiene: unique passwords and strong 2FA.

Q4: What’s better than passwords?
A: Passkeys and hardware/app‑based 2FA. Make them the default for critical accounts.

CTA
Want a copy‑paste “remediation tracker” Google Sheet with status and due dates? Comment “REMEDIATE” and I’ll add a linkable template to your post. alfaiznova.com

Alfaiz Ansari is a digital strategist and researcher specializing in Cybersecurity, Artificial Intelligence, and Digital Marketing. As the mind behind Alfaiznova.com, he combines technical expertise …