Open-Source Intelligence (OSINT) — Practical Guide for Cyber Reconnaissance
Open-Source Intelligence (OSINT) is the practice of collecting and analyzing publicly available information to gain insights into a target — be it an organization, infrastructure, or digital footprint. In cybersecurity, OSINT is critical for attack surface mapping, threat intelligence, vulnerability research, and incident response.

This guide outlines how to conduct lawful and efficient OSINT for penetration testing, red teaming, or bug bounty reconnaissance, complete with safe tools and command-line workflows.
Note: All examples and commands are for authorized testing and research only. Never use these techniques against systems or individuals without explicit permission.
1. Understanding OSINT
OSINT isn’t just Googling a company’s name — it’s a structured intelligence process:
- Define objective: What’s the mission? (e.g., identify subdomains or exposed assets)
- Collect data: Use passive sources like archives, search engines, and public databases.
- Process and enrich: Correlate data from multiple sources.
- Analyze: Identify patterns, risks, and relationships.
- Report: Present findings clearly, with actionable insights.
2. Legal & Ethical Framework
OSINT operates within the boundaries of law and ethics. Key principles include:
- Collect only publicly available information.
- Respect Terms of Service for all platforms.
- Avoid scanning or brute-forcing without authorization.
- Protect personal data — don’t disclose private information unless there’s a clear, lawful reason.
- Always maintain an audit trail of your actions and evidence.
3. OSINT Toolkit (with Real Commands)
Web & Archive Discovery
- gau – Retrieve archived URLs from sources like Wayback Machine:
gau example.com > urls.txt
- waybackurls – Get historical snapshots:
waybackurls example.com > wayback.txt
- curl – Download and inspect a page:
curl -sL 'https://example.com/page' -o page.html
Subdomain & DNS Enumeration
- amass – Comprehensive subdomain discovery:
amass enum -d example.com -o amass.txt
- subfinder – Fast passive subdomain finder:
subfinder -d example.com -o subs.txt
- dig – Check DNS records:
dig +short example.com ANY
Passive Service Discovery
- Shodan – Search internet-exposed devices:
shodan search --fields ip_str,port,org "hostname:example.com"
- Censys – Identify hosts via certificate and banner data.
Passive reconnaissance avoids triggering alarms or violating access policies.
Email, Identity & Leak Search
- theHarvester – Gather emails, hosts, and subdomains:
theHarvester -d example.com -b all -l 500
- HaveIBeenPwned API – Check if emails appear in breaches.
- Hunter.io – Find corporate email patterns and contacts.
URL Filtering and Validation
Use gf and httpx to detect live endpoints with interesting parameters:
cat urls.txt | gf lfi | httpx -threads 50 -o live_lfi.txt
Social Media & People Search
- LinkedIn, GitHub, Twitter, Facebook – Manual and API-based searches.
- Maltego – Visual link analysis between identities, domains, and organizations.
- SpiderFoot – Automated OSINT platform for aggregation and enrichment.
Always use dummy or research accounts. Never engage with targets directly.
4. Google Dorking for Discovery
Example Google queries (dorks) for reconnaissance:
- Find admin panels:
site:example.com intitle:"login"
- Find configuration files:
site:example.com ext:conf OR ext:env OR ext:ini
- Find exposed documents:
site:example.com filetype:pdf OR filetype:xlsx
- GitHub code exposure:
org:example "AWS_SECRET_ACCESS_KEY"
Use responsibly. Avoid accessing or downloading sensitive files from external sources.
5. Certificate Transparency & Domain Correlation
Use crt.sh to find hidden subdomains:
curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r '.[].name_value'
Use the output to feed enumeration tools like amass or subfinder.
6. Timeline Reconstruction
When conducting threat or leak analysis:
- Record timestamps (archive date, WHOIS updates, commit history).
- Normalize to UTC (ISO 8601 format).
- Correlate key events — e.g., domain registration → GitHub repo creation → credential leak.
- Store immutable evidence (hash files, archive URLs).
7. OPSEC for OSINT Investigators
- Use VPNs or dedicated research environments (VMs, containers).
- Isolate browser sessions per target.
- Avoid personal accounts or cookies.
- Maintain anonymous accounts for testing (within legal limits).
- Never “like,” “follow,” or interact with targets under observation.
8. Common OSINT Workflows
A. Find Public S3 Buckets
grep -Eo "s3\.amazonaws\.com/[^\"' ]+" urls.txt | sort -u
B. Enumerate Subdomains via Certificates
curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r '.[].name_value'
C. Shodan for Open Ports
shodan search --fields ip,port,org "ssl.cert.subject.CN:example.com"
D. Collect Paste Leaks (manual or via APIs)
Use paste search engines or breach monitoring services — not public dumps.
9. OSINT Reporting Template
Title: Exposed Backup File on Public Server Scope: example.com (and related subdomains) Impact: Publicly accessible SQL backup containing user data. Evidence: URL, screenshots, timestamps, SHA256 hash of downloaded file. Recommendation: Restrict directory listing, remove backup, rotate database credentials.
10. Key Red Flags
- Publicly accessible credentials (
.env,.git, or config files). - Leaked databases or S3 buckets.
- Exposed admin panels or dev subdomains.
- Sensitive information on GitHub (tokens, private repos).
Treat these as high-priority disclosures — report via official bug bounty or responsible disclosure channels.
11. Final Thoughts
OSINT is not hacking — it’s structured intelligence work. When used responsibly, it strengthens cybersecurity, supports investigations, and improves digital resilience. Your power as an OSINT analyst lies in curiosity, discipline, and respect for privacy.