Denny Zhang

Current focus

Using probes and evals to study deceptive behavior in language models.

Detecting deception in language models

Probe-based analysis of deceptive behavior and internal model signals.

Recent writing

Apr 18, 2026 Interpretability / Omission probe / Qwen 3.5

Two Kinds of Deception, Two Kinds of Signal

Replicating Apollo Research's linear-probe pipeline on Qwen 3.5-4B. The original paper already flagged insider trading as layer-sensitive; on Qwen the cross-domain transfer collapses belo...

Apr 3, 2026 Interpretability / Qwen 3.5

First Notes on Extending Deception Detection to Qwen 3.5

Initial results from extending a deception-detection pipeline to Qwen 3.5-4B, including rollout quality, grading noise, and early probe performance.