Researchers Break Open AI’s Black Box—and Use What They Find Inside to Control It

Source: Singularity Hub | Published: 2026-02-23T22:35:51+00:00

Researchers introduced the Recursive Feature Machine (RFM) to extract concept vectors from large AI models, enabling efficient monitoring and steering of behavior across language, vision-language, and reasoning systems; the technique identified both harmful ('anti-refusal') and beneficial ('anti-deception') vectors, worked cross-linguistically, transferred between models, and required under 500 samples and a single A100 GPU to operate.

Why it mattersGPT-4o concept vectors show product teams must update model guardrails to block 'anti-refusal' vulnerabilities.

Read Original Source

Back to Longevity News