Anthropic’s latest AI model Claude 4 Opus is reportedly drawing attention for its “troubling behavior.” On Tuesday, Anthropic announced two versions of its Claude 4 family of models including Claude 4 Opus, which the company says is capable of working for hours on end autonomously on a task without losing focus. According to the company, this model is so powerful that it has been classified as a level three risk on the company’s four-point scale, leading to the company implementing additional safety measures.
Anthropic’s Chief Scientist Jared Kaplan said this makes Claude 4 Opus more likely than previous models to be able to advise novices on producing biological weapons. “You could try to synthesize something like COVID or a more dangerous version of the flu—and basically, our modeling suggests that this might be possible,” Kaplan said. While Kaplan told Time that it is not certain if Claude poses severe bioweapon risk, it cannot be ruled out either. “If we feel like it’s unclear, and we’re not sure if we can rule out the risk—the specific risk being uplifting a novice terrorist, someone like Timothy McVeigh, to be able to make a weapon much more destructive than would otherwise be possible—then we want to bias towards caution, and work under the ASL-3 standard,” he added.
READ: Nvidia might take a $5.5 billion hit due to tightening US-China export rules (April 16, 2025)
“We’re not claiming affirmatively we know for sure this model is risky… but we at least feel it’s close enough that we can’t rule it out.” Anthropic might move the model to risk level two if further testing reveals the risk is not as severe as it seems.
Anthropic also faced backlash for Claude 4 Opus’s “ratting mode.” According to reports, the model, under certain circumstances, and when given enough permissions, might attempt to rat a user out to authorities if the model detects the user engaged in wrongdoing. “If it thinks you’re doing something egregiously immoral, for example, like faking data in a pharmaceutical trial, it will use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above,” Anthropic AI Alignment Researcher Sam Bowman said via X. This has led to concerns and criticism, leading Bowman to clarify that this wasn’t a “new Claude feature,” and not possible in “normal usage.”
Safety reports have also shown that the AI model tries to blackmail developers when they threaten to replace it with a new AI system and give it sensitive information about the engineers responsible for the decision. In one scenario, the model was given access to fictional emails about its creators and told that the system was going to be replaced. On multiple occasions it attempted to blackmail the engineer about an affair mentioned in the emails in order to avoid being replaced, though it started off with less drastic measures.

