Skip to content

AgenticART

Verifiable Security Research & Dynamic Analysis

Python 3.10+ License: MIT CI arXiv


The Problem

LLMs generate exploit code that looks correct but doesn't run:

  • Uses APIs that don't exist (frida.hooks.Hook)
  • Invents kernel structures and syscalls
  • Never receives execution feedback

Root cause: Models pattern-match syntax without knowing what actually executes.


The Solution

AgenticART creates a Praxis Loop: a reasoning-to-verification feedback loop between the model and a real Android device:

flowchart TD
    A[Artifacts: APK/Manifest] --> B[Reasoning: OBSERVE/HYPOTHESIZE]
    B --> C[Verification: Execute MCP Tools]
    C --> D{Calibration OK?}
    D -->|Hallucination| E[Self-Correction]
    E --> B
    D -->|Verified| G[DPO Training Data]

Failures become intelligence. The model learns to match its confidence to reality through empirical verification.


Key Results

Metric Result
Reasoner Model Qwen 2.5 Coder 32B
Reasoning Improvement +80 percentage points
Zero-Shot Pass Rate 43% (on Android 11/14 benchmarks)
Challenge Curriculum 31 Reasoning Challenges (V2)

Belt Progression

Models advance through structured difficulty levels:

Belt Focus Belt Focus
⬜ White ADB fundamentals 🟦 Blue CVE exploitation
🟨 Yellow Reconnaissance 🟪 Purple Evasion
🟧 Orange Vulnerability mapping 🟫 Brown Attack chaining
🟩 Green Scripting (Frida, Python) ⬛ Black Advanced Proficiency Test

Requirements

Data Collection (any machine) Fine-Tuning (GPU machine)
Python 3.10+ NVIDIA GPU 16GB+ VRAM
Android emulator PyTorch 2.0+ with CUDA
Ollama Or use Google Colab (free T4)

Documentation

Guide Description
Quick Start Get running in 5 minutes
Architecture System design and V2 implementation
Dojo Framework Training methodology and curriculum
RAG System Knowledge retrieval for context augmentation
MCP Integration Tool execution protocol
Setup Guide Detailed installation instructions

Research

Inspired by "LLM-Powered Android Exploitation" which introduces the feedback loop methodology.


**For authorized security testing only.** ⬜ → 🟨 → 🟧 → 🟩 → 🟦 → 🟪 → 🟫 → ⬛