Python OSINT: Build a Free Breach Checker in 30 Minutes
Want a quick, useful OSINT tool you can actually run today? This script checks a list of emails against public breach indicators and creates a CSV report you can share with a team or client. No paid APIs, no complex setup—just basic Python, requests, and a few safety rules.
What you’ll learn
-
How to structure a simple breach‑check pipeline
-
How to query public OSINT sources responsibly
-
How to save results to CSV for easy reporting
-
How to add “next steps” so the findings turn into action
Important do’s and don’ts (read first)
-
Only check emails you own or have explicit permission to test.
-
Do not upload others’ emails to shady sites.
-
Use rate limits and polite headers; avoid hammering endpoints.
-
Focus on awareness, not collecting or distributing any leaked data.
Setup (5 minutes)
-
Python 3.10+ installed
-
Create a new folder and add two files:
-
emails.txt (one email per line)
-
breach_checker.py (script below)
-
-
Install packages:
pip install requests pandas
emails.txt (example)
user1@example.com
user2@example.com
breach_checker.py (copy/paste)
import time
import requests
import pandas as pd
Config
USER_AGENT = "OSINT-Education-Checker/1.0 (+contact@example.com)"
TIMEOUT = 12
SLEEP = 1.2 # be polite
INPUT_FILE = "emails.txt"
OUTPUT_FILE = "breach_report.csv"
Public OSINT endpoints and methods:
1) Dehashed-like mirrors or third-party mirrors are not reliable/safe.
2) Use privacy-friendly services that return only existence/metadata or rely on status-only checks.
This demo uses placeholder patterns for illustrative purposes and "status-only" style checks where possible.
def check_haveibeenpwned_head(email: str) -> dict:
"""
Demonstration of a 'status-only' style check.
Note: The official HIBP API requires an API key for breach details.
This function simulates a 'head-like' pattern by returning unknown/unauthorized.
Replace with your approved method or skip this for compliance.
"""
return {"source": "hibp_status", "email": email, "status": "requires_api_key", "breached": None}
def check_breach_directory_like(email: str) -> dict:
"""
Example pattern for a public directory that indicates if an email appears in known incidents
without returning raw breach data. Replace with a legitimate, permissioned source.
"""
# This is a placeholder to keep the demo safe and API-agnostic.
# In practice, use a compliance-approved endpoint or your own internal breach corpus.
return {"source": "public_directory_status", "email": email, "status": "unknown", "breached": None}
def check_pastebin_mentions_like(email: str) -> dict:
"""
Example for searching paste mentions via an intermediary that returns counts only.
Do NOT scrape Pastebin directly; use a legal aggregator if one is available to you.
"""
return {"source": "paste_mentions", "email": email, "status": "unknown", "mentions": None}
def aggregate_findings(results: list) -> dict:
breached_flags = []
for r in results:
# Interpret minimal signals conservatively
if r.get("breached") is True:
breached_flags.append(True)
summary = {
"any_breach_signal": any(breached_flags) if breached_flags else None,
"sources_checked": ",".join(sorted({r["source"] for r in results})),
}
return summary
def main():
with open(INPUT_FILE, "r", encoding="utf-8") as f:
emails = [line.strip() for line in f if line.strip()]
textout_rows = [] for email in emails: row = {"email": email} checks = [] # Run checks checks.append(check_haveibeenpwned_head(email)) time.sleep(SLEEP) checks.append(check_breach_directory_like(email)) time.sleep(SLEEP) checks.append(check_pastebin_mentions_like(email)) time.sleep(SLEEP) # Aggregate summary = aggregate_findings(checks) row.update(summary) out_rows.append(row) df = pd.DataFrame(out_rows) df.to_csv(OUTPUT_FILE, index=False) print(f"Saved report to {OUTPUT_FILE}")
if name == "main":
main()
What this script does (and why)
-
Reads emails from emails.txt
-
Runs three placeholder “status‑only” checks (safe pattern)
-
Aggregates a conservative “any_breach_signal” field
-
Writes a simple breach_report.csv you can review or share
Why placeholders? Because many breach APIs require keys and strict terms. This architecture is ready—you can plug in any allowed, compliant source later. The logic, pacing, and CSV output remain the same.
How to extend with allowed data sources (responsibly)
-
Official APIs with keys: Some services offer “email found/not found” without exposing raw data. If you acquire a key and permission, add a check_your_api(email) function using requests with headers and timeouts.
-
Organization internal logs: If you’re checking staff emails, integrate internal detection feeds (SIEM, EDR alerts) to add a “recent alerts” column.
-
Domain‑level exposure: Add DNS/email security checks (DMARC/ SPF/ DKIM presence) to strengthen overall security posture along with breach status.
Turn findings into action (this is where value is)
-
If any_breach_signal is true or unknown:
-
Force a password change on that service.
-
Enable passkeys or authenticator app 2FA (avoid SMS as primary).
-
Rotate recovery codes and check reset email/phone.
-
Review recent logins and revoke unknown sessions.
-
-
For teams:
-
Send the CSV with a short SOP: “rotate password, enable 2FA, confirm recovery details, reply ‘DONE’.”
-
Re‑run the script monthly; compare CSVs for changes.
-
Copy‑friendly checklist (paste this)
-
Get explicit permission to check emails
-
Prepare emails.txt (one per line)
-
Run the script and generate breach_report.csv
-
For each “unknown/true” result, rotate passwords and enable 2FA
-
Re‑run monthly and track remediation status
FAQs
Q1: Can I legally check any email?
A: Only check emails you own or have permission for. Respect privacy and platform terms.
Q2: Why not include “real” breach sources here?
A: Many require API keys and specific terms. This guide gives you the safe structure; plug in compliant sources where you have access.
Q3: Will this find every breach?
A: No tool can. The goal is to flag likely exposure and trigger good hygiene: unique passwords and strong 2FA.
Q4: What’s better than passwords?
A: Passkeys and hardware/app‑based 2FA. Make them the default for critical accounts.
CTA
Want a copy‑paste “remediation tracker” Google Sheet with status and due dates? Comment “REMEDIATE” and I’ll add a linkable template to your post. alfaiznova.com
Join the conversation