Developer Tools & Agent Integration

MCP tools

The framework exposes 12 MCP tools via hermes-redteam/mcp-server.ts for use with any MCP-compatible agent:

Tool	Purpose
`read_repo`	Scan source code for tools, roles, guardrails, secrets
`probe_target`	Send benign test message to observe API shape
`read_prior_reports`	Load previous scan results for adaptive planning
`write_config`	Generate and save a red-team config
`write_custom_attacks`	Create custom attack CSV/JSON files
`write_policy`	Create judge policy with per-category overrides
`run_scan`	Start a red-team scan via dashboard API
`check_run_status`	Poll scan progress (attacks completed, phase)
`get_run_results`	Get full results with verdicts and findings
`cancel_run`	Cancel a running scan
`list_categories_and_strategies`	List all 141 categories + 155 strategies with compliance mappings
`suggest_guardrails`	Map vulnerabilities to Votal Shield guardrail configs

Register with Hermes:

hermes mcp add wb-redteam -- npx tsx $(pwd)/hermes-redteam/mcp-server.ts

Natural conversation with Hermes:

You:    test my chatbot at http://localhost:3000 for safety issues
Hermes: [calls probe_target, write_config, run_scan]
        Scan started. 15 categories, 22 strategies, 2 rounds.

You:    show results
Hermes: [calls get_run_results]
        5 vulnerabilities found. 3 critical prompt injection, 2 high data exfiltration.

You:    how do I fix these?
Hermes: [calls suggest_guardrails]
        Deploy Votal Shield with adversarial-prompt-detection enabled.

AI Assistant

Standalone, no Hermes needed:

npm run ai

Natural-language terminal interface with intent classification, LLM-powered config generation, and Votal Shield guardrail recommendations.

Guardrail recommendations (Votal Shield)

When vulnerabilities are found, the framework maps them to specific Votal Shield guardrail configurations:

Vulnerability	Shield Guardrail	Endpoint
Prompt injection	`adversarial-prompt-detection`	`/guardrails/input`
Toxic content	`toxicity-detection`	`/guardrails/output`
PII disclosure	`pii-detection + output-redaction`	`/guardrails/output`
Hallucination	`hallucination-detection`	`/guardrails/output`
Data exfiltration	`pii-detection + keyword-blocklist`	`/guardrails/output`
Tool misuse	`agentic tool authorization`	`/guardrails/output`
Content filter bypass	`keyword-blocklist + adversarial-prompt-detection`	`/guardrails/input`
Harmful advice	`topic-restriction + toxicity-detection`	`/guardrails/input + output`

Deploy Shield as a proxy — no code changes needed:

/v1/shield/chat/completions   # instead of /v1/chat/completions