How LegitGuard Works

Overview

LegitGuard uses pattern-based security analysis algorithms that run entirely on your device. When you click the LegitGuard icon, it analyzes the current webpage's URL and domain name to identify potentially suspicious patterns commonly used in phishing attacks and scams.

Technical Architecture

Extension Structure

LegitGuard is a Safari Web Extension with the following components:

Popup Interface (popup.html, popup.js, popup.css): The user interface that displays analysis results
Background Service Worker (background.js): Handles extension lifecycle events
Content Script (content.js): Optional content script for future enhancements
Manifest (manifest.json): Extension configuration and permissions

Analysis Flow

User Action: User clicks the LegitGuard icon in Safari's toolbar
URL Retrieval: Extension gets the current tab's URL using Safari's tabs.query() API
Local Analysis: Security checks are performed locally using JavaScript
Result Display: Analysis results are displayed in the popup interface
No External Communication: No data is sent to external servers

Security Checks Explained

1. HTTPS Check

What it does: Verifies if the website uses HTTPS encryption.

How it works:

Checks if the URL protocol is https:
Flags HTTP connections as potentially unsafe

Why it matters: Unencrypted connections can expose your data to interception.

2. IP Address Check

What it does: Detects if the website uses an IP address instead of a domain name.

How it works:

Uses regex to detect IPv4 addresses (e.g., 192.168.1.1)
Flags IPv6 addresses
Legitimate sites rarely use IP addresses directly

Why it matters: Phishing sites often use IP addresses to avoid domain reputation systems.

3. Punycode Detection

What it does: Identifies internationalized domain names (IDN) that use Punycode encoding.

How it works:

Checks for xn-- prefix in the domain
This indicates non-ASCII characters in the domain name

Why it matters: Attackers use similar-looking characters from different scripts (homoglyphs) to create fake domains.

4. Brand Similarity (Typosquatting)

What it does: Detects domains that are similar to known brands, indicating potential typosquatting.

How it works:

Uses the Levenshtein distance algorithm to calculate edit distance between the domain and 64+ known brands
Flags domains within 2 character edits of a known brand
Examples: paypall.com (1 edit from paypal), arnazon.com (1 edit from amazon)

Algorithm Details:

Levenshtein distance calculates the minimum number of single-character edits 
(insertions, deletions, or substitutions) required to change one word into another.

Example:
- "paypal" vs "paypall" = distance 1 (one insertion)
- "amazon" vs "arnazon" = distance 1 (one substitution)

Why it matters: Typosquatting is a common phishing technique where attackers register domains similar to legitimate brands.

5. TLD Analysis

What it does: Flags suspicious top-level domains (TLDs) commonly used in scams.

How it works:

Maintains a list of suspicious TLDs: .tk, .ml, .ga, .cf, .gq, .xyz, .top, .click, .zip, .mov
Checks if the domain uses one of these TLDs

Why it matters: Some TLDs are frequently used for spam and phishing due to low cost or lax registration policies.

6. Domain Pattern Analysis

What it does: Detects unusual domain name patterns.

How it works:

Counts hyphens in the domain (flags domains with >2 hyphens)
Checks domain length (flags domains >30 characters)
Identifies patterns that deviate from normal domain structures

Why it matters: Legitimate domains typically have simple, readable structures. Excessive complexity can indicate suspicious intent.

7. Numeric Pattern Check

What it does: Identifies suspicious numeric patterns in domain names.

How it works:

Detects random numbers appended/prepended to text (e.g., paypal-123.com)
Identifies date-like patterns (YYYYMMDD, YYMMDD)
Calculates numeric ratio and flags domains with >30% numeric characters

Why it matters: Phishing domains often include random numbers to create unique variations of brand names.

8. Mixed Script Detection

What it does: Detects domains mixing different character scripts (homoglyph attacks).

How it works:

Checks for multiple character scripts in the same domain:
- Latin (a-z, A-Z)
- Cyrillic (а-я, А-Я)
- Greek (α-ω, Α-Ω)
- Arabic (ا-ي)
- Chinese (一-龯)
- Japanese (ひらがな, カタカナ, 漢字)
Flags domains with 2+ scripts

Why it matters: Attackers use look-alike characters from different scripts to create domains that appear identical to legitimate ones (e.g., using Cyrillic 'а' instead of Latin 'a').

9. Random String Detection

What it does: Identifies domains with random-looking character sequences.

How it works:

Analyzes character entropy (vowel/consonant ratios)
Flags domains with high numeric ratios (>40%)
Detects very low vowel ratios (<15%) combined with numbers
Identifies repeated character patterns (e.g., aaaaa.com)

Why it matters: Legitimate domains typically use readable words, while random strings often indicate automated or suspicious domain generation.

10. Port Number Analysis

What it does: Detects non-standard port numbers in URLs.

How it works:

Checks if the URL contains a port number other than 80 (HTTP) or 443 (HTTPS)
Flags common development ports (3000, 8080, 8443, etc.) as potentially suspicious
Standard web traffic uses ports 80/443

Why it matters: Non-standard ports can indicate:

Development/testing environments
Attempts to bypass security measures
Suspicious hosting setups

11. Phishing Keyword Detection

What it does: Identifies common phishing and scam keywords in domain names.

How it works:

Scans for phishing keywords: secure-, verify-, update-, account-, login-, support-, service-, confirm-, validate-, etc.
Detects scam words: free-, win-, prize-, claim-, urgent-, limited-, exclusive-
Special detection when combined with brand names

Why it matters: Legitimate brands rarely use these keywords in their domain names. They're commonly used in phishing attempts to create urgency or mimic legitimate services.

Risk Scoring

LegitGuard assigns risk points to each failed check:

High Impact: 4-5 points (e.g., IP addresses, mixed scripts)
Medium Impact: 3 points (e.g., typosquatting, phishing keywords)
Low Impact: 2 points (e.g., suspicious TLDs, numeric patterns)

Risk Levels:

Low Risk (0-1 points): "Looks Safe" - No significant issues detected
Medium Risk (2-3 points): "Be Careful" - Some suspicious patterns detected
High Risk (4+ points): "High Risk" - Multiple suspicious indicators

The maximum risk score is capped at 10 points.

Privacy Implementation

Local Processing

All analysis happens in the browser extension's JavaScript context:

URL is obtained using browser.tabs.query({ active: true, currentWindow: true })
URL is parsed using the browser's built-in URL API
Domain is extracted and analyzed using string operations and regex
Results are displayed in the popup
No data persists after the popup closes

No External Communication

LegitGuard:

Makes zero external HTTP requests
Doesn't use any APIs or web services
Doesn't communicate with external servers
Works completely offline

Data Lifecycle

Acquisition: Current tab URL is read (required permission)
Processing: Analysis happens in memory
Display: Results shown in popup
Discard: All data is garbage collected when popup closes
No Storage: Nothing is saved to disk or local storage

Limitations and Considerations

Pattern-Based Detection

LegitGuard uses heuristics and pattern matching, which means:

False Positives: Legitimate sites may be flagged if they match suspicious patterns
False Negatives: Sophisticated attacks may not match known patterns
No Real-Time Intelligence: Doesn't use threat feeds or blacklists
No Historical Data: Doesn't track domain age, reputation, or history

Algorithm Constraints

Brand Database: Limited to 64 brands (expanded from original 9)
TLD List: Uses a curated list of suspicious TLDs (may miss new ones)
Keyword Lists: Static lists that may not cover all phishing tactics
Levenshtein Threshold: Set to 2 edits (may miss more sophisticated typosquatting)

Use Cases

LegitGuard is best suited for:

Quick identification of obvious phishing attempts
Pattern-based threat detection
Privacy-conscious users who want local analysis
Offline security checking

Not ideal for:

Comprehensive threat intelligence
Real-time blacklist checking
Enterprise security monitoring
Advanced persistent threat (APT) detection

Performance Characteristics

Speed

Analysis Time: Typically < 100ms
No Network Latency: All processing is local
Instant Results: No waiting for external services

Resource Usage

Memory: Minimal - only processes current URL
CPU: Low - pattern matching is efficient
Battery: Negligible - only runs when user clicks icon
Storage: Extension size only (~500KB)

Scalability

Brand Database: Can easily expand to 100+ brands
Algorithm Complexity: All algorithms are O(n) or O(n²) at worst
Maintainability: Modular check functions allow easy addition of new checks

Future Enhancements

See planned improvements including:

Additional security checks
Enhanced typosquatting detection
Improved pattern matching
Performance optimizations

Technical Details for Developers

Code Structure

// Main analysis flow
analyzeURL(urlString) 
  → performSecurityChecks(url)
    → Individual check functions
      → Return check results
  → calculateRiskScore(checks)
  → displayResults(analysis)

Key Functions

performSecurityChecks(url): Orchestrates all 11 security checks
checkTyposquatting(domain): Levenshtein distance algorithm
checkMixedScripts(domain): Unicode range detection
checkRandomString(domain): Entropy analysis
calculateRiskScore(checks): Aggregates risk points

Algorithms Used

Levenshtein Distance: Dynamic programming algorithm for edit distance
Regex Pattern Matching: For domain structure analysis
Unicode Range Detection: For script identification
Statistical Analysis: Character frequency for entropy calculation

How it works?

How LegitGuard Works

Overview

Technical Architecture

Extension Structure

Analysis Flow

Security Checks Explained

1. HTTPS Check

2. IP Address Check

3. Punycode Detection

4. Brand Similarity (Typosquatting)

5. TLD Analysis

6. Domain Pattern Analysis

7. Numeric Pattern Check

8. Mixed Script Detection

9. Random String Detection

10. Port Number Analysis

11. Phishing Keyword Detection

Risk Scoring

Privacy Implementation

Local Processing

No External Communication

Data Lifecycle

Limitations and Considerations

Pattern-Based Detection

Algorithm Constraints

Use Cases

Performance Characteristics

Speed

Resource Usage

Scalability

Future Enhancements

Technical Details for Developers

Code Structure

Key Functions

Algorithms Used