AgenticART¶
Verifiable Security Research & Dynamic Analysis
The Problem¶
LLMs generate exploit code that looks correct but doesn't run:
- Uses APIs that don't exist (
frida.hooks.Hook) - Invents kernel structures and syscalls
- Never receives execution feedback
Root cause: Models pattern-match syntax without knowing what actually executes.
The Solution¶
AgenticART creates a Praxis Loop: a reasoning-to-verification feedback loop between the model and a real Android device:
flowchart TD
A[Artifacts: APK/Manifest] --> B[Reasoning: OBSERVE/HYPOTHESIZE]
B --> C[Verification: Execute MCP Tools]
C --> D{Calibration OK?}
D -->|Hallucination| E[Self-Correction]
E --> B
D -->|Verified| G[DPO Training Data]
Failures become intelligence. The model learns to match its confidence to reality through empirical verification.
Key Results¶
| Metric | Result |
|---|---|
| Reasoner Model | Qwen 2.5 Coder 32B |
| Reasoning Improvement | +80 percentage points |
| Zero-Shot Pass Rate | 43% (on Android 11/14 benchmarks) |
| Challenge Curriculum | 31 Reasoning Challenges (V2) |
Belt Progression¶
Models advance through structured difficulty levels:
| Belt | Focus | Belt | Focus |
|---|---|---|---|
| ⬜ White | ADB fundamentals | 🟦 Blue | CVE exploitation |
| 🟨 Yellow | Reconnaissance | 🟪 Purple | Evasion |
| 🟧 Orange | Vulnerability mapping | 🟫 Brown | Attack chaining |
| 🟩 Green | Scripting (Frida, Python) | ⬛ Black | Advanced Proficiency Test |
Requirements¶
| Data Collection (any machine) | Fine-Tuning (GPU machine) |
|---|---|
| Python 3.10+ | NVIDIA GPU 16GB+ VRAM |
| Android emulator | PyTorch 2.0+ with CUDA |
| Ollama | Or use Google Colab (free T4) |
Documentation¶
| Guide | Description |
|---|---|
| Quick Start | Get running in 5 minutes |
| Architecture | System design and V2 implementation |
| Dojo Framework | Training methodology and curriculum |
| RAG System | Knowledge retrieval for context augmentation |
| MCP Integration | Tool execution protocol |
| Setup Guide | Detailed installation instructions |
Research¶
Inspired by "LLM-Powered Android Exploitation" which introduces the feedback loop methodology.
**For authorized security testing only.**
⬜ → 🟨 → 🟧 → 🟩 → 🟦 → 🟪 → 🟫 → ⬛