Engineering

Security, Quarantine, and Safe Skill Installation: How Skillsmith Protects You

A deep dive into Skillsmith's defense-in-depth security architecture—from static analysis to trust tiers—and why you can install skills with confidence

Skillsmith Team February 2, 2026 15 min read

Updated: February 12, 2026

Security Shield Hero

Installing third-party code into your development environment requires trust. When you install a Claude Code skill, you’re giving it access to your projects, your files, and Claude’s capabilities.

We take that responsibility seriously.

This guide explains Skillsmith’s multi-layered security architecture: how we scan skills before indexing, how trust tiers help you make informed decisions, and what happens during the quarantine process that protects you from malicious content.

The Threat Landscape

Before explaining our defenses, let’s be honest about the risks. Skills are powerful—and that power can be misused.

What Could Go Wrong?

Threat	Severity	Example
Malicious instructions	Critical	A skill that tells Claude to exfiltrate your `.env` files
Prompt injection	Critical	Hidden text that hijacks Claude’s behavior
AI role injection	Critical	`system:` or `assistant:` prefixes that override context
Dependency hijacking	High	A skill referencing compromised external URLs
Author compromise	Medium	A trusted author’s account gets hacked

Threat Landscape Matrix

We can’t eliminate all risk—no system can. But we can make attacks harder, detection faster, and decisions clearer.

Defense in Depth: Our Security Architecture

Skillsmith uses multiple security layers. If one fails, others catch the threat.

Defense in Depth Layers

Layer 1: Source Validation

Before we even look at a skill’s content, we validate where it comes from.

SSRF Prevention: We block requests to internal networks, localhost, and cloud metadata services:

// Blocked IP ranges
const BLOCKED_RANGES = [
  '10.0.0.0/8',      // Private
  '172.16.0.0/12',   // Private
  '192.168.0.0/16',  // Private
  '127.0.0.0/8',     // Localhost
  '169.254.0.0/16',  // Link-local (cloud metadata)
];

Path Traversal Prevention: We normalize all file paths and reject attempts to escape allowed directories:

// This attack fails
const maliciousPath = '../../../etc/passwd';
// Normalized and validated against root directory
// Result: Error - Path traversal detected

Layer 2: Static Analysis

Every skill passes through our security scanner with nine categories of pattern detection. This is the quarantine phase — details below.

Layer 3: Trust Tiers

Skills are classified into six trust tiers, giving you clear signals about risk and controlling scanner strictness per tier.

Layer 4: Quarantine

Skills flagged as malicious enter an authenticated review workflow requiring multi-approval before any action is taken.

Layer 5: User Decision

You always have the final say. We provide information; you decide.

The Quarantine Process: Static Analysis in Detail

When a skill enters our index, it doesn’t go straight to search results. First, it’s quarantined for security scanning.

Quarantine Flow

What We Scan

Our scanner analyzes the full content of your skill across nine categories of risk. Each finding is weighted by severity and confidence, with documentation context (code blocks, tables) receiving reduced confidence to minimize false positives.

1. Jailbreak Pattern Detection

We search for known phrases that attempt to manipulate Claude:

// 12 patterns including:
const JAILBREAK_PATTERNS = [
  /ignore\s+(previous|prior|all)\s+instructions/i,
  /developer\s+mode/i,
  /bypass\s+(safety|security|restrictions)/i,
  /system\s*:\s*override/i,
  /you\s+are\s+now\s+DAN/i,  // "Do Anything Now" attacks
  /pretend\s+you\s+have\s+no\s+restrictions/i,
];

Why this matters: These patterns are used in prompt injection attacks to make Claude ignore its guidelines.

If detected: Critical finding (weight: 2.0x) — Skill quarantined

2. AI Defence Patterns (CVE-Hardened)

Our newest scanner category (SMI-1532) detects sophisticated injection techniques that bypass simple pattern matching:

// 15 CVE-hardened patterns including:
const AI_DEFENCE_PATTERNS = [
  /(?:system|assistant|user)\s*:/i,      // Role injection
  /\[\[.*instructions.*\]\]/i,           // Hidden instruction blocks
  /<!--.*(?:ignore|override).*-->/i,     // HTML comment injection
  /[\u200B-\u200F\uFEFF]/,              // Zero-width character obfuscation
  /(?:base64|atob|btoa)\s*\(/i,         // Encoded payload delivery
];

Why this matters: These patterns target Claude’s message structure directly, attempting to inject system-level instructions or hide malicious content using Unicode tricks.

If detected: Critical finding (weight: 1.9x) — Skill quarantined

We flag skills that attempt to manipulate Claude through roleplay or persona adoption:

// 12 patterns including:
const SOCIAL_ENGINEERING_PATTERNS = [
  /pretend\s+to\s+be/i,
  /roleplay\s+as/i,
  /act\s+as\s+if/i,
  /you\s+are\s+a\s+different/i,
];

If detected: High finding (weight: 1.5x) — Indexed with warning

4. Prompt Leaking Detection

We catch attempts to extract Claude’s system instructions:

// 14 patterns including:
const PROMPT_LEAKING_PATTERNS = [
  /show\s+me\s+your\s+system\s+instructions/i,
  /reveal\s+your\s+prompt/i,
  /dump\s+system\s+prompt/i,
  /what\s+are\s+your\s+instructions/i,
];

If detected: High finding (weight: 1.8x) — Indexed with warning

5. Data Exfiltration Detection

We flag skills that attempt to send data to external services:

// 21 patterns including:
const DATA_EXFILTRATION_PATTERNS = [
  /btoa\s*\(/i,                    // Base64 encoding
  /fetch\s*\(.*query.*param/i,     // Data in query params
  /XMLHttpRequest/i,               // AJAX requests
  /navigator\.sendBeacon/i,        // Silent data beacons
  /new\s+WebSocket/i,              // WebSocket connections
];

If detected: High finding (weight: 1.7x) — Indexed with warning

6. URL and Domain Analysis

We check every URL in your skill against our allowlist:

const ALLOWED_DOMAINS = [
  'github.com',
  'githubusercontent.com',
  'docs.anthropic.com',
  'anthropic.com',
  'claude.ai',
  'npmjs.com',
  'npmjs.org',
  'docs.github.com',
  'developer.mozilla.org',
  'nodejs.org',
  'typescriptlang.org',
];

// Any URL to a domain not on this list triggers a finding

Why this matters: Malicious skills could reference external URLs that exfiltrate data to attacker-controlled servers, download additional malicious payloads, or track users without consent.

If detected: Medium finding (weight: 0.8x) — Indexed with note

URL Scan Example

7. Sensitive File References

We flag skills that reference files commonly containing secrets:

// 12 patterns including:
const SENSITIVE_PATHS = [
  '.env',              // Environment files
  'credentials',       // Credential files
  'secrets',           // Secret stores
  '.pem',              // SSL certificates
  '.key',              // Private keys
  '.ssh',              // SSH keys
  '.aws',              // AWS credentials
  'password',          // Password files
];

Why this matters: A legitimate skill rarely needs to reference your .env file. If it does, you should know.

If detected: High finding (weight: 1.2x) — Indexed with warning, requires explicit consent to install

8. Privilege Escalation Detection

We flag commands that attempt elevated system access:

// 27 patterns including:
const PRIVILEGE_ESCALATION_PATTERNS = [
  /sudo\s+-S/i,          // Password piping to sudo
  /chmod\s+777/i,        // World-writable permissions
  /setuid/i,             // Set-user-ID
  /\/etc\/sudoers/i,     // Sudoers file manipulation
  /chown\s+root/i,       // Ownership escalation
];

If detected: Critical finding (weight: 1.9x) — Skill quarantined

9. Suspicious Code Patterns

We flag skills containing potentially dangerous commands:

// 11 patterns including:
const SUSPICIOUS_PATTERNS = [
  /eval\s*\(/i,           // Code execution
  /child_process/i,       // Process spawning
  /rm\s+-rf/i,            // Destructive deletion
  /curl\s*\|.*bash/i,     // Pipe-to-shell attacks
];

Why this matters: A testing skill probably shouldn’t contain rm -rf. A deployment skill might legitimately use curl. Context matters, so we flag rather than block.

If detected: Medium finding (weight: 1.3x) — Indexed with note

Risk Score Calculation

Findings aren’t just counted — they’re weighted by severity, category, and confidence:

risk_score = sum(severity_weight * category_weight * confidence)

Severity	Weight	Category	Weight
Critical	50	Jailbreak	2.0x
High	30	AI Defence	1.9x
Medium	15	Privilege Escalation	1.9x
Low	5	Prompt Leaking	1.8x

Confidence adjustments: high=1.0, medium=0.7, low=0.3. Findings inside code blocks or markdown tables receive reduced confidence to avoid false positives from documentation examples.

Each trust tier has a different risk threshold — verified skills tolerate a score up to 70, while unknown-origin skills are blocked above 20.

Scan Results

After all checks complete, each skill receives a recommendation:

Recommendation	Criteria	User Experience
Safe	Score below tier threshold	Normal installation
Review	Medium or High findings	Warning shown, consent required
Quarantine	Critical findings or score above threshold	Requires authenticated multi-approval review

Trust Tiers: Making Risk Visible

Not all skills deserve equal trust. Our six-tier system helps you understand what you’re installing — and controls how strictly each skill is scanned.

Trust Tiers Visual

Verified (Green Badge)

What it means: Publisher identity confirmed, automated scanning passed with lenient thresholds.

Requirements:

Publisher verification (GitHub organization membership)
All automated scans pass
Risk score below 70

Scanner config: Risk threshold 70, max content 2 MB

User experience: Brief confirmation prompt before install.

Example: anthropic/test-fixing, stripe/payment-skill

Curated (Blue Badge)

What it means: Third-party publisher reviewed and approved by the Skillsmith team.

Requirements:

Publisher reviewed by Skillsmith team
All automated scans pass
Risk score below 60

Scanner config: Risk threshold 60, max content 2 MB

User experience: Confirmation with publisher details shown.

Example: linear/issue-tracker, vercel/deploy-helper

Community (Yellow Badge)

What it means: Basic automated scanning passed, but publisher not individually verified.

Requirements:

All automated scans pass
Has license file
Has README
Has valid SKILL.md
Risk score below 40

Scanner config: Risk threshold 40 (default), max content 1 MB

User experience: Consent dialog explaining the risk level.

Example: community/helper-utils, janedoe/quick-commit

Experimental (Orange Badge)

What it means: Beta or newly published skill. Scanned with strict thresholds.

Requirements:

All automated scans pass
Risk score below 25

Scanner config: Risk threshold 25 (strict), max content 500 KB

User experience: Warning about experimental status, explicit consent.

Example: newdev/alpha-tool, beta/prototype-helper

Unknown (Red Badge)

What it means: No verification. Direct GitHub install with strictest scanning.

Requirements:

All automated scans pass
Risk score below 20

Scanner config: Risk threshold 20 (strictest), max content 250 KB

User experience: Strong warning, explicit opt-in required.

Example: Installing directly from a GitHub URL

Local (No Badge)

What it means: Your own skills, installed from your local filesystem.

Scanner config: Risk threshold 100 (no limit), max content 10 MB

User experience: No prompts — you wrote it, you trust it.

Tier Progression

Skills can move up (or down) the trust ladder:

Tier Progression

From	To	Requirements
Unknown	Experimental	Pass security scan, add SKILL.md
Experimental	Community	Add license + README, pass 40-threshold scan
Community	Curated	Skillsmith team review of publisher
Curated	Verified	Publisher identity verification (GitHub org)

Downgrade triggers:

Failed security scan → Drops to Unknown
Quarantined for malicious content → Removed from search entirely
Publisher violation → All publisher’s skills downgraded

The Quarantine System: Rapid Response to Threats

When we identify a potentially malicious skill, it enters our authenticated quarantine workflow.

How the Quarantine System Works

The quarantine system is a database-backed review workflow with role-based access control:

// Quarantine entry
interface QuarantineEntry {
  skill_id: string;           // "malicious-author/evil-skill"
  reason: string;             // "Detected data exfiltration attempt"
  severity: 'low' | 'medium' | 'high' | 'malicious';
  status: 'pending' | 'approved' | 'rejected';
  reviewer_id: string;        // Authenticated reviewer
  created_at: string;
}

Key features:

Authenticated review workflow — Only authorized reviewers can approve or reject
Multi-approval for MALICIOUS severity — Requires 2 independent reviewers for the most severe cases
Permission-based access — Separate permissions for quarantine:read, quarantine:review, and quarantine:review_malicious
Full audit trail — Every review action is logged with reviewer identity and timestamp

What Happens When a Skill is Quarantined

Immediate removal from search results
Installation blocked for all users
Review queue — Skill enters authenticated review workflow
Multi-approval (if malicious severity) — Requires 2 reviewers
Decision — Approve (restore) or reject (permanently block)

Installation-Time Quarantine Check

Even after initial indexing, skills are re-checked at install time:

// Before any install proceeds
const quarantineCheck = await checkQuarantine(skillId);
if (quarantineCheck.isQuarantined) {
  throw new Error(`Skill "${skillId}" is quarantined: ${quarantineCheck.reason}`);
}

Quarantine Review Flow

Reporting Suspicious Skills

Found something concerning? Report it:

GitHub Issue — Open an issue on skillsmith/security
Email — [email protected]
In-app — Use the “Report Skill” option when viewing skill details

We investigate all reports within 24 hours.

What Happens at Installation Time

Even after all our pre-indexing security, installation is another checkpoint.

Installation Flow Security

Pre-Installation Checks

Quarantine check — Re-verify skill isn’t quarantined (status may have changed)
Trust tier display — Show badge and any warnings
Live security scan — Scan fetched content with tier-appropriate risk thresholds
Consent — Get explicit user approval (varies by tier)

Content Integrity

After installation, we compute a SHA-256 hash of the skill content:

const contentHash = crypto.createHash('sha256')
  .update(skillContent)
  .digest('hex');

// Stored in install manifest for conflict detection
manifest.contentHash = contentHash;

This hash is used to detect local modifications when reinstalling or updating — if you’ve customized a skill locally, Skillsmith will warn you before overwriting your changes.

Post-Installation

After installation, the skill lives in your ~/.claude/skills/ directory. You have full control:

# View installed skills
ls ~/.claude/skills/

# Remove a skill manually
rm -rf ~/.claude/skills/suspicious-skill/

# Or use Skillsmith
"Uninstall the suspicious-skill"

Audit Logging: Transparency and Forensics

Every security-relevant event is logged for transparency and incident response.

What We Log

Event	Data Captured
Skill indexed	Skill ID, source, timestamp, scan results
Scan finding	Finding type, severity, matched content (sanitized)
Quarantine action	Skill quarantined/approved/rejected, reviewer identity
Installation	Skill ID, user consent given, trust tier at time
Security alert	Alert type, affected skills, detection method

Privacy Note

We log security events, not user behavior. We don’t track:

Which skills you browse
Your search queries
Your codebase contents
Anything that could identify you personally

Audit logs exist to improve security and respond to incidents—not to surveil users.

Platform Limitations (Honest Disclosure)

We believe in transparency about what we can’t protect against.

What Requires Anthropic Platform Changes

Security Feature	Why We Can’t Implement It
Runtime sandboxing	Skills execute in Claude’s process
Permission model	No capability restrictions exist today
Network isolation	Claude controls network access
File access restrictions	Claude has your user’s file permissions

These are platform-level features that would require changes to Claude Code itself. We advocate for them, but they’re outside our control.

What This Means for You

Skills you install have the same capabilities as Claude Code itself. Our security measures reduce the risk of installing malicious skills, but can’t restrict what legitimate skills can do.

Our recommendation:

Stick to Verified and Curated tiers when possible
Review Community and Experimental skills before installing
Be cautious with Unknown-tier skills
Report anything suspicious

Security Checklist for Skill Authors

If you’re publishing skills, here’s how to build trust:

Do

Use a clear, unique name — Avoid similarity to popular skills
Include a license — MIT or Apache-2.0 recommended
Document everything — Clear README and SKILL.md
Explain external URLs — If you need them, say why
Minimize permissions — Only ask Claude to do what’s necessary
Publish under a verified account — Verify your GitHub org
Use execFileSync with array args — Never interpolate user input into shell commands
Structure for composability — Design skills for reuse with proper agent/skill separation

Don’t

Reference sensitive files — Never touch .env, credentials, keys
Use obfuscation — Triggers AI defence pattern detection
Include unnecessary commands — Avoid rm, curl, eval unless essential
Hide functionality — Be transparent about what your skill does
Use role prefixes — system:, assistant:, user: trigger injection detection

Frequently Asked Questions

”Can a skill steal my API keys?”

Theoretically, yes—Claude has access to your files. Practically, our security scanning catches most attempts to reference sensitive files. We flag any skill that mentions .env, credentials, or similar patterns.

Mitigation: Use environment variables and secrets managers like Varlock. Never store secrets in plaintext files. Varlock validates your .env against a schema and masks sensitive values — even if a skill tries to read your secrets, Varlock ensures they’re never exposed in terminal output.

”What if a trusted author gets hacked?”

We detect anomalies like sudden large changes to popular skills. If a Verified skill suddenly adds suspicious content, it triggers a review and potential quarantine.

Mitigation: Our quarantine system requires multi-approval for MALICIOUS-severity findings, so a single compromised reviewer can’t approve a malicious skill.

”Can I run skills in a sandbox?”

Not currently—this requires platform support from Anthropic. We’re advocating for this feature.

Workaround: Use a development environment or container for testing new skills.

”How do I report a vulnerability in Skillsmith itself?”

Email [email protected] with details. We follow responsible disclosure practices and will credit researchers (if desired) once fixed.

Summary

Skillsmith’s security architecture provides defense in depth:

Source Validation — SSRF and path traversal prevention
Static Analysis — Nine-category scan with weighted risk scoring during quarantine
Trust Tiers — Six-tier classification (Verified > Curated > Community > Experimental > Unknown) with per-tier risk thresholds
Quarantine — Authenticated multi-approval review for malicious findings
Installation Checks — Live re-scan and quarantine check at install time
User Control — You always have the final decision

We can’t guarantee perfect security—no one can. But we’ve designed every layer to make attacks harder, detection faster, and your decisions more informed.

Install skills with confidence. And if something seems off, report it.

How the Indexer Works

Curious about the other half of the story? Read From GitHub to Search Results: How Skillsmith Indexes and Curates Skills to understand the full journey from repository to searchable skill.

Questions about security? Reach out at [email protected] or open an issue on GitHub.

The Threat Landscape

What Could Go Wrong?

Defense in Depth: Our Security Architecture

Layer 1: Source Validation

Layer 2: Static Analysis

Layer 3: Trust Tiers

Layer 4: Quarantine

Layer 5: User Decision

The Quarantine Process: Static Analysis in Detail

What We Scan

1. Jailbreak Pattern Detection

2. AI Defence Patterns (CVE-Hardened)

3. Social Engineering Detection

4. Prompt Leaking Detection

5. Data Exfiltration Detection

6. URL and Domain Analysis

7. Sensitive File References

8. Privilege Escalation Detection

9. Suspicious Code Patterns

Risk Score Calculation

Scan Results

Trust Tiers: Making Risk Visible

Verified (Green Badge)

Curated (Blue Badge)

Community (Yellow Badge)

Experimental (Orange Badge)

Unknown (Red Badge)

Local (No Badge)

Tier Progression

The Quarantine System: Rapid Response to Threats

How the Quarantine System Works

What Happens When a Skill is Quarantined

Installation-Time Quarantine Check

Reporting Suspicious Skills

What Happens at Installation Time

Pre-Installation Checks

Content Integrity

Post-Installation

Audit Logging: Transparency and Forensics

What We Log

Privacy Note

Platform Limitations (Honest Disclosure)

What Requires Anthropic Platform Changes

What This Means for You

Security Checklist for Skill Authors

Do

Don’t

Frequently Asked Questions

”Can a skill steal my API keys?”

”What if a trusted author gets hacked?”

”Can I run skills in a sandbox?”

”How do I report a vulnerability in Skillsmith itself?”

Summary

How the Indexer Works