Affiliate Ram Shankar Siva Kumar and coauthors "present a practical scanner for identifying sleeper agent-style backdoors in causal language models." ...