How it works?
How LegitGuard Works
Overview
LegitGuard uses pattern-based security analysis algorithms that run entirely on your device. When you click the LegitGuard icon, it analyzes the current webpage's URL and domain name to identify potentially suspicious patterns commonly used in phishing attacks and scams.
Technical Architecture
Extension Structure
LegitGuard is a Safari Web Extension with the following components:
- Popup Interface (
popup.html,popup.js,popup.css): The user interface that displays analysis results - Background Service Worker (
background.js): Handles extension lifecycle events - Content Script (
content.js): Optional content script for future enhancements - Manifest (
manifest.json): Extension configuration and permissions
Analysis Flow
- User Action: User clicks the LegitGuard icon in Safari's toolbar
- URL Retrieval: Extension gets the current tab's URL using Safari's
tabs.query()API - Local Analysis: Security checks are performed locally using JavaScript
- Result Display: Analysis results are displayed in the popup interface
- No External Communication: No data is sent to external servers
Security Checks Explained
1. HTTPS Check
What it does: Verifies if the website uses HTTPS encryption.
How it works:
- Checks if the URL protocol is
https: - Flags HTTP connections as potentially unsafe
Why it matters: Unencrypted connections can expose your data to interception.
2. IP Address Check
What it does: Detects if the website uses an IP address instead of a domain name.
How it works:
- Uses regex to detect IPv4 addresses (e.g.,
192.168.1.1) - Flags IPv6 addresses
- Legitimate sites rarely use IP addresses directly
Why it matters: Phishing sites often use IP addresses to avoid domain reputation systems.
3. Punycode Detection
What it does: Identifies internationalized domain names (IDN) that use Punycode encoding.
How it works:
- Checks for
xn--prefix in the domain - This indicates non-ASCII characters in the domain name
Why it matters: Attackers use similar-looking characters from different scripts (homoglyphs) to create fake domains.
4. Brand Similarity (Typosquatting)
What it does: Detects domains that are similar to known brands, indicating potential typosquatting.
How it works:
- Uses the Levenshtein distance algorithm to calculate edit distance between the domain and 64+ known brands
- Flags domains within 2 character edits of a known brand
- Examples:
paypall.com(1 edit frompaypal),arnazon.com(1 edit fromamazon)
Algorithm Details:
Levenshtein distance calculates the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into another. Example: - "paypal" vs "paypall" = distance 1 (one insertion) - "amazon" vs "arnazon" = distance 1 (one substitution)
Why it matters: Typosquatting is a common phishing technique where attackers register domains similar to legitimate brands.
5. TLD Analysis
What it does: Flags suspicious top-level domains (TLDs) commonly used in scams.
How it works:
- Maintains a list of suspicious TLDs:
.tk,.ml,.ga,.cf,.gq,.xyz,.top,.click,.zip,.mov - Checks if the domain uses one of these TLDs
Why it matters: Some TLDs are frequently used for spam and phishing due to low cost or lax registration policies.
6. Domain Pattern Analysis
What it does: Detects unusual domain name patterns.
How it works:
- Counts hyphens in the domain (flags domains with >2 hyphens)
- Checks domain length (flags domains >30 characters)
- Identifies patterns that deviate from normal domain structures
Why it matters: Legitimate domains typically have simple, readable structures. Excessive complexity can indicate suspicious intent.
7. Numeric Pattern Check
What it does: Identifies suspicious numeric patterns in domain names.
How it works:
- Detects random numbers appended/prepended to text (e.g.,
paypal-123.com) - Identifies date-like patterns (YYYYMMDD, YYMMDD)
- Calculates numeric ratio and flags domains with >30% numeric characters
Why it matters: Phishing domains often include random numbers to create unique variations of brand names.
8. Mixed Script Detection
What it does: Detects domains mixing different character scripts (homoglyph attacks).
How it works:
- Checks for multiple character scripts in the same domain:
- Latin (a-z, A-Z)
- Cyrillic (а-я, А-Я)
- Greek (α-ω, Α-Ω)
- Arabic (ا-ي)
- Chinese (一-龯)
- Japanese (ひらがな, カタカナ, 漢字)
- Flags domains with 2+ scripts
Why it matters: Attackers use look-alike characters from different scripts to create domains that appear identical to legitimate ones (e.g., using Cyrillic 'а' instead of Latin 'a').
9. Random String Detection
What it does: Identifies domains with random-looking character sequences.
How it works:
- Analyzes character entropy (vowel/consonant ratios)
- Flags domains with high numeric ratios (>40%)
- Detects very low vowel ratios (<15%) combined with numbers
- Identifies repeated character patterns (e.g.,
aaaaa.com)
Why it matters: Legitimate domains typically use readable words, while random strings often indicate automated or suspicious domain generation.
10. Port Number Analysis
What it does: Detects non-standard port numbers in URLs.
How it works:
- Checks if the URL contains a port number other than 80 (HTTP) or 443 (HTTPS)
- Flags common development ports (3000, 8080, 8443, etc.) as potentially suspicious
- Standard web traffic uses ports 80/443
Why it matters: Non-standard ports can indicate:
- Development/testing environments
- Attempts to bypass security measures
- Suspicious hosting setups
11. Phishing Keyword Detection
What it does: Identifies common phishing and scam keywords in domain names.
How it works:
- Scans for phishing keywords:
secure-,verify-,update-,account-,login-,support-,service-,confirm-,validate-, etc. - Detects scam words:
free-,win-,prize-,claim-,urgent-,limited-,exclusive- - Special detection when combined with brand names
Why it matters: Legitimate brands rarely use these keywords in their domain names. They're commonly used in phishing attempts to create urgency or mimic legitimate services.
Risk Scoring
LegitGuard assigns risk points to each failed check:
- High Impact: 4-5 points (e.g., IP addresses, mixed scripts)
- Medium Impact: 3 points (e.g., typosquatting, phishing keywords)
- Low Impact: 2 points (e.g., suspicious TLDs, numeric patterns)
Risk Levels:
- Low Risk (0-1 points): "Looks Safe" - No significant issues detected
- Medium Risk (2-3 points): "Be Careful" - Some suspicious patterns detected
- High Risk (4+ points): "High Risk" - Multiple suspicious indicators
The maximum risk score is capped at 10 points.
Privacy Implementation
Local Processing
All analysis happens in the browser extension's JavaScript context:
- URL is obtained using
browser.tabs.query({ active: true, currentWindow: true }) - URL is parsed using the browser's built-in
URLAPI - Domain is extracted and analyzed using string operations and regex
- Results are displayed in the popup
- No data persists after the popup closes
No External Communication
LegitGuard:
- Makes zero external HTTP requests
- Doesn't use any APIs or web services
- Doesn't communicate with external servers
- Works completely offline
Data Lifecycle
- Acquisition: Current tab URL is read (required permission)
- Processing: Analysis happens in memory
- Display: Results shown in popup
- Discard: All data is garbage collected when popup closes
- No Storage: Nothing is saved to disk or local storage
Limitations and Considerations
Pattern-Based Detection
LegitGuard uses heuristics and pattern matching, which means:
- False Positives: Legitimate sites may be flagged if they match suspicious patterns
- False Negatives: Sophisticated attacks may not match known patterns
- No Real-Time Intelligence: Doesn't use threat feeds or blacklists
- No Historical Data: Doesn't track domain age, reputation, or history
Algorithm Constraints
- Brand Database: Limited to 64 brands (expanded from original 9)
- TLD List: Uses a curated list of suspicious TLDs (may miss new ones)
- Keyword Lists: Static lists that may not cover all phishing tactics
- Levenshtein Threshold: Set to 2 edits (may miss more sophisticated typosquatting)
Use Cases
LegitGuard is best suited for:
- Quick identification of obvious phishing attempts
- Pattern-based threat detection
- Privacy-conscious users who want local analysis
- Offline security checking
Not ideal for:
- Comprehensive threat intelligence
- Real-time blacklist checking
- Enterprise security monitoring
- Advanced persistent threat (APT) detection
Performance Characteristics
Speed
- Analysis Time: Typically < 100ms
- No Network Latency: All processing is local
- Instant Results: No waiting for external services
Resource Usage
- Memory: Minimal - only processes current URL
- CPU: Low - pattern matching is efficient
- Battery: Negligible - only runs when user clicks icon
- Storage: Extension size only (~500KB)
Scalability
- Brand Database: Can easily expand to 100+ brands
- Algorithm Complexity: All algorithms are O(n) or O(n²) at worst
- Maintainability: Modular check functions allow easy addition of new checks
Future Enhancements
See planned improvements including:
- Additional security checks
- Enhanced typosquatting detection
- Improved pattern matching
- Performance optimizations
Technical Details for Developers
Code Structure
// Main analysis flow analyzeURL(urlString) → performSecurityChecks(url) → Individual check functions → Return check results → calculateRiskScore(checks) → displayResults(analysis)
Key Functions
performSecurityChecks(url): Orchestrates all 11 security checkscheckTyposquatting(domain): Levenshtein distance algorithmcheckMixedScripts(domain): Unicode range detectioncheckRandomString(domain): Entropy analysiscalculateRiskScore(checks): Aggregates risk points
Algorithms Used
- Levenshtein Distance: Dynamic programming algorithm for edit distance
- Regex Pattern Matching: For domain structure analysis
- Unicode Range Detection: For script identification
- Statistical Analysis: Character frequency for entropy calculation