Back to Documentation
Guides
LegitGuard Safari Extension

How it works?

Updated December 25, 2025

How LegitGuard Works

Overview

LegitGuard uses pattern-based security analysis algorithms that run entirely on your device. When you click the LegitGuard icon, it analyzes the current webpage's URL and domain name to identify potentially suspicious patterns commonly used in phishing attacks and scams.

Technical Architecture

Extension Structure

LegitGuard is a Safari Web Extension with the following components:

  • Popup Interface (popup.html, popup.js, popup.css): The user interface that displays analysis results
  • Background Service Worker (background.js): Handles extension lifecycle events
  • Content Script (content.js): Optional content script for future enhancements
  • Manifest (manifest.json): Extension configuration and permissions

Analysis Flow

  1. User Action: User clicks the LegitGuard icon in Safari's toolbar
  2. URL Retrieval: Extension gets the current tab's URL using Safari's tabs.query() API
  3. Local Analysis: Security checks are performed locally using JavaScript
  4. Result Display: Analysis results are displayed in the popup interface
  5. No External Communication: No data is sent to external servers

Security Checks Explained

1. HTTPS Check

What it does: Verifies if the website uses HTTPS encryption.

How it works:

  • Checks if the URL protocol is https:
  • Flags HTTP connections as potentially unsafe

Why it matters: Unencrypted connections can expose your data to interception.

2. IP Address Check

What it does: Detects if the website uses an IP address instead of a domain name.

How it works:

  • Uses regex to detect IPv4 addresses (e.g., 192.168.1.1)
  • Flags IPv6 addresses
  • Legitimate sites rarely use IP addresses directly

Why it matters: Phishing sites often use IP addresses to avoid domain reputation systems.

3. Punycode Detection

What it does: Identifies internationalized domain names (IDN) that use Punycode encoding.

How it works:

  • Checks for xn-- prefix in the domain
  • This indicates non-ASCII characters in the domain name

Why it matters: Attackers use similar-looking characters from different scripts (homoglyphs) to create fake domains.

4. Brand Similarity (Typosquatting)

What it does: Detects domains that are similar to known brands, indicating potential typosquatting.

How it works:

  • Uses the Levenshtein distance algorithm to calculate edit distance between the domain and 64+ known brands
  • Flags domains within 2 character edits of a known brand
  • Examples: paypall.com (1 edit from paypal), arnazon.com (1 edit from amazon)

Algorithm Details:

Levenshtein distance calculates the minimum number of single-character edits 
(insertions, deletions, or substitutions) required to change one word into another.

Example:
- "paypal" vs "paypall" = distance 1 (one insertion)
- "amazon" vs "arnazon" = distance 1 (one substitution)

Why it matters: Typosquatting is a common phishing technique where attackers register domains similar to legitimate brands.

5. TLD Analysis

What it does: Flags suspicious top-level domains (TLDs) commonly used in scams.

How it works:

  • Maintains a list of suspicious TLDs: .tk, .ml, .ga, .cf, .gq, .xyz, .top, .click, .zip, .mov
  • Checks if the domain uses one of these TLDs

Why it matters: Some TLDs are frequently used for spam and phishing due to low cost or lax registration policies.

6. Domain Pattern Analysis

What it does: Detects unusual domain name patterns.

How it works:

  • Counts hyphens in the domain (flags domains with >2 hyphens)
  • Checks domain length (flags domains >30 characters)
  • Identifies patterns that deviate from normal domain structures

Why it matters: Legitimate domains typically have simple, readable structures. Excessive complexity can indicate suspicious intent.

7. Numeric Pattern Check

What it does: Identifies suspicious numeric patterns in domain names.

How it works:

  • Detects random numbers appended/prepended to text (e.g., paypal-123.com)
  • Identifies date-like patterns (YYYYMMDD, YYMMDD)
  • Calculates numeric ratio and flags domains with >30% numeric characters

Why it matters: Phishing domains often include random numbers to create unique variations of brand names.

8. Mixed Script Detection

What it does: Detects domains mixing different character scripts (homoglyph attacks).

How it works:

  • Checks for multiple character scripts in the same domain:
    • Latin (a-z, A-Z)
    • Cyrillic (а-я, А-Я)
    • Greek (α-ω, Α-Ω)
    • Arabic (ا-ي)
    • Chinese (一-龯)
    • Japanese (ひらがな, カタカナ, 漢字)
  • Flags domains with 2+ scripts

Why it matters: Attackers use look-alike characters from different scripts to create domains that appear identical to legitimate ones (e.g., using Cyrillic 'а' instead of Latin 'a').

9. Random String Detection

What it does: Identifies domains with random-looking character sequences.

How it works:

  • Analyzes character entropy (vowel/consonant ratios)
  • Flags domains with high numeric ratios (>40%)
  • Detects very low vowel ratios (<15%) combined with numbers
  • Identifies repeated character patterns (e.g., aaaaa.com)

Why it matters: Legitimate domains typically use readable words, while random strings often indicate automated or suspicious domain generation.

10. Port Number Analysis

What it does: Detects non-standard port numbers in URLs.

How it works:

  • Checks if the URL contains a port number other than 80 (HTTP) or 443 (HTTPS)
  • Flags common development ports (3000, 8080, 8443, etc.) as potentially suspicious
  • Standard web traffic uses ports 80/443

Why it matters: Non-standard ports can indicate:

  • Development/testing environments
  • Attempts to bypass security measures
  • Suspicious hosting setups

11. Phishing Keyword Detection

What it does: Identifies common phishing and scam keywords in domain names.

How it works:

  • Scans for phishing keywords: secure-, verify-, update-, account-, login-, support-, service-, confirm-, validate-, etc.
  • Detects scam words: free-, win-, prize-, claim-, urgent-, limited-, exclusive-
  • Special detection when combined with brand names

Why it matters: Legitimate brands rarely use these keywords in their domain names. They're commonly used in phishing attempts to create urgency or mimic legitimate services.

Risk Scoring

LegitGuard assigns risk points to each failed check:

  • High Impact: 4-5 points (e.g., IP addresses, mixed scripts)
  • Medium Impact: 3 points (e.g., typosquatting, phishing keywords)
  • Low Impact: 2 points (e.g., suspicious TLDs, numeric patterns)

Risk Levels:

  • Low Risk (0-1 points): "Looks Safe" - No significant issues detected
  • Medium Risk (2-3 points): "Be Careful" - Some suspicious patterns detected
  • High Risk (4+ points): "High Risk" - Multiple suspicious indicators

The maximum risk score is capped at 10 points.

Privacy Implementation

Local Processing

All analysis happens in the browser extension's JavaScript context:

  1. URL is obtained using browser.tabs.query({ active: true, currentWindow: true })
  2. URL is parsed using the browser's built-in URL API
  3. Domain is extracted and analyzed using string operations and regex
  4. Results are displayed in the popup
  5. No data persists after the popup closes

No External Communication

LegitGuard:

  • Makes zero external HTTP requests
  • Doesn't use any APIs or web services
  • Doesn't communicate with external servers
  • Works completely offline

Data Lifecycle

  1. Acquisition: Current tab URL is read (required permission)
  2. Processing: Analysis happens in memory
  3. Display: Results shown in popup
  4. Discard: All data is garbage collected when popup closes
  5. No Storage: Nothing is saved to disk or local storage

Limitations and Considerations

Pattern-Based Detection

LegitGuard uses heuristics and pattern matching, which means:

  • False Positives: Legitimate sites may be flagged if they match suspicious patterns
  • False Negatives: Sophisticated attacks may not match known patterns
  • No Real-Time Intelligence: Doesn't use threat feeds or blacklists
  • No Historical Data: Doesn't track domain age, reputation, or history

Algorithm Constraints

  • Brand Database: Limited to 64 brands (expanded from original 9)
  • TLD List: Uses a curated list of suspicious TLDs (may miss new ones)
  • Keyword Lists: Static lists that may not cover all phishing tactics
  • Levenshtein Threshold: Set to 2 edits (may miss more sophisticated typosquatting)

Use Cases

LegitGuard is best suited for:

  • Quick identification of obvious phishing attempts
  • Pattern-based threat detection
  • Privacy-conscious users who want local analysis
  • Offline security checking

Not ideal for:

  • Comprehensive threat intelligence
  • Real-time blacklist checking
  • Enterprise security monitoring
  • Advanced persistent threat (APT) detection

Performance Characteristics

Speed

  • Analysis Time: Typically < 100ms
  • No Network Latency: All processing is local
  • Instant Results: No waiting for external services

Resource Usage

  • Memory: Minimal - only processes current URL
  • CPU: Low - pattern matching is efficient
  • Battery: Negligible - only runs when user clicks icon
  • Storage: Extension size only (~500KB)

Scalability

  • Brand Database: Can easily expand to 100+ brands
  • Algorithm Complexity: All algorithms are O(n) or O(n²) at worst
  • Maintainability: Modular check functions allow easy addition of new checks

Future Enhancements

See planned improvements including:

  • Additional security checks
  • Enhanced typosquatting detection
  • Improved pattern matching
  • Performance optimizations

Technical Details for Developers

Code Structure

// Main analysis flow
analyzeURL(urlString) 
  → performSecurityChecks(url)
    → Individual check functions
      → Return check results
  → calculateRiskScore(checks)
  → displayResults(analysis)

Key Functions

  • performSecurityChecks(url): Orchestrates all 11 security checks
  • checkTyposquatting(domain): Levenshtein distance algorithm
  • checkMixedScripts(domain): Unicode range detection
  • checkRandomString(domain): Entropy analysis
  • calculateRiskScore(checks): Aggregates risk points

Algorithms Used

  • Levenshtein Distance: Dynamic programming algorithm for edit distance
  • Regex Pattern Matching: For domain structure analysis
  • Unicode Range Detection: For script identification
  • Statistical Analysis: Character frequency for entropy calculation