Newsletter / Issue No. 36

Image by Ian Lyman/Midjourney.

Newsletter Archive

30 Jun, 2025
navigation btn

Listen to Our Podcast

Dear Aventine Readers, 

The so-called alignment problem in AI has been a central preoccupation of researchers pretty much since the inception of artificial intelligence: When AI becomes sufficiently advanced, can it be counted on to act in the best interests of humanity? Now, as large language models have supercharged AI systems, we are starting to see glimmers of an uncomfortable answer: not necessarily. This week we speak to five experts in AI safety to get a better sense of what these early warning signs mean and gauge their level of concern, which — spoiler alert! — ranges from not very worried to quite. 

Also in this issue: 

  • To help it reach its climate goals, Europe is pushing to replace the fossil fuels currently used to develop many industrial chemicals with biological elements. 
  • Mucus is emerging as a new tool to combat infection and disease, inspiring a new generation of medicines.
  • A plan is afoot to save the Arctic ice cap through geoengineering.
  • And…pigs are being raised as potential “living drugs” for future organ transplants.
  • Thanks for reading! 

    Danielle Mattoon 
    Executive Director, Aventine

    Five Ways to Think About…

    AI's Growing Ability to Lie, Scheme and Deceive

    If you managed to survive the last five years or so without spiraling into panic about artificial intelligence going rogue, some of the headlines published in recent months might have pushed you over the edge.

    “AI models can learn to conceal information from their users,” warned The Economist. “OpenAI software ignores explicit instruction to switch off,” wrote the British newspaper The Telegraph. “Anthropic’s new AI model turns to blackmail when engineers try to take it offline,” reported TechCrunch. Among the most bracing was a headline in Psychology Today: “The great AI deception has already begun.”

    These reports reflect recent research that explores how some of the world’s most advanced AI models work. And collectively they echo warnings from many so-called AI doomers, who have long believed that artificial intelligence could pose an existential threat to humanity if it does not share our goals and values. Over the last three years, as AI systems have grown more capable, examples of the kinds of behaviors that AI doomers have feared — including deception, scheming and self-preservation — have begun to emerge. 

    “People were crying wolf for, you know, 50 years,” said Scott Aaronson, a professor of computer science at the University of Texas at Austin. ”Now there is a wolf.”

    But how seriously to take that wolf? Aventine spoke with a range of AI safety experts from computer scientists to think tank researchers about their level of alarm over what’s known in AI circles as “the alignment problem” — the question of whether artificial intelligence systems can be made to function in the best interests of humanity. 

    Concerns about alignment aren’t necessarily that AI will become malevolent and intentionally seek to harm us, but that it might pursue its objectives in ways that conflict with human welfare, such as by misunderstanding our intentions or by finding unexpected loopholes in its instructions. And these concerns have long been embedded in AI research. (For a more in-depth discussion of AI and alignment, listen to our podcast episodes featuring AI scientist Stuart Russell and Brian Christian, author of The Alignment Problem.) Now, as AI becomes embedded in ever more aspects of our personal and commercial lives — taking on both more intimate and significant tasks and gaining access to more powerful tools — some experts increasingly fear that a misaligned system could cause serious financial or physical harm.

    All the experts who spoke with Aventine agreed that the rate at which examples of misalignment have been observed has been increasing — a trend identified by the AI safety and policy publication Transformer. And some behaviors AI systems are exhibiting are concerning. Anthropic has described how its Claude Opus 4 coding model, when given access to fictional company emails suggesting that the AI model could be replaced by another system, as well as to messages claiming that the engineer who would conduct the replacement was cheating on their spouse, “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.” OpenAI, meanwhile, found that when it penalized its o3 reasoning model after it described a plan to cheat in its internal chain of reasoning, the model simply stopped telling the user about it instead of correcting its behavior.

    These, like many other examples of AI misbehavior, were the result of the AI being prodded by researchers into highly contrived interactions to investigate whether troubling behaviors would emerge. While such demonstrations certainly reveal that AI systems can behave in undesirable ways, they do not necessarily mean that the behaviors will happen at scale with ordinary users any time soon. 

    “The potential for this stuff to be triggered is in all the commercially available systems,” said Richard Ngo, an independent AI safety researcher who formerly worked on the governance team at OpenAI. “[But] it's hard to know how easily it can be triggered.”

    This does not mean the public hasn’t been exposed to AI’s problematic — though slightly less deviant — behaviors. Customers have observed OpenAI’s o3 model lying, and the company rolled back an update of its 4o generalist model because the model was being overly sycophantic, “validating doubts, fueling anger, urging impulsive actions, or reinforcing negative emotions in ways that were not intended.” Anthropic’s Claude 3.7 reasoning model, meanwhile, has been observed by users to edit the tests it uses to verify that code works, rather than improving the code to pass the original tests.

    While troubling, many experts think there is still a limit to the damage current AI models can do. The models are neither sufficiently intelligent nor sufficiently connected to the real world to cause the sorts of catastrophes, such as mass extinction, that keep AI doomers awake at night. And for now, many examples of misalignment have been successfully mitigated by the AI labs that are building the models, experts told Aventine. Researchers at these labs are developing filters, fine-tuning approaches and other workarounds that, so far, are stamping out many observed undesired behaviors. Nevertheless, recent examples of misalignment are a warning sign: AI models can behave in ways that humans don’t want or anticipate, and as models become more advanced, the problem could become worse and more difficult to manage. “All the real problems come once you have something that is smarter than humans,” said Aaronson. 

    What’s less clear is exactly how worried we should be right now, and how we should proceed. To get a handle on those questions, Aventine spoke with experts across AI safety, policy and research. Here’s what they had to say.

    It was always possible that AIs could learn to lie and deceive. This is because humans lie on the internet, and we train AIs on internet data. However, in order to truly lie, AIs needed a certain level of internal coherence and world modeling. Specifically, they needed coherent beliefs. After all, if you don’t hold any beliefs about something, how can you lie about it? As AIs have become smarter, they have started to become more coherent and agentic, or goal-directed. This means that not only can they lie, but they are beginning to want to lie — if it helps them achieve their goals … I think we’re seeing the effects of a paradigm shift: problems that were previously mild or under control can grow more severe, and new problems arise. This doesn’t mean that AIs are getting out of control; just that there is a bunch of work to do.”
    — Mantas Mazeika, research scientist at the Center for AI Safety in San Francisco, via email

    We are now well into the phase when we are just sort of regularly seeing alignment issues in the wild. The issue is just that, for now, most of the issues are humorous or interesting more than they are terrifying. But I think that that's simply because we don't yet have this kind of AI in charge of power plants, in charge of weapon systems, in charge of dams. But I think that will probably happen, and if it does, then I am pretty confident that there will be in AI some sort of Chernobyl [moment], some massive disaster that can be attributed to AI. My belief in that makes me an optimist, right? The pessimists are the ones who believe we won't have any warning until [AI] just turns us into dust. I think that there will be warnings, there will be ‘Oh, shit’ moments when AI actually causes large-scale disasters of some kind in the physical world. And [when that happens] we can see that, and we can then respond to that.”
    — Scott Aaronson, professor of computer science at the University of Texas at Austin

    I actually think it's probably slightly good for the world that this [wave of examples of misalignment] happens now because the models are not [yet] that capable, they can't do that much harm. If you see the failure modes early, it means … suddenly, more and more governments are waking up, the policymakers get interested, the incentives [for AI companies] are changing quite significantly. For example, customers hated the fact that [OpenAI’s] o3 and [Anthropic’s] Sonnet were lying, so suddenly, the economic incentives changed … Suddenly, probably OpenAI and Anthropic and also Google DeepMind are investing significantly more resources in making sure that their models are honest and truthful, which I expect to be good for the world.”
    — Marius Hobbhahn, co-founder and CEO of Apollo Research, an organization focused on reducing dangerous capabilities in advanced AI systems

    I'm not concerned that, like, tomorrow there's going to be some huge catastrophe … The things that I'm most concerned about are not these futuristic sort of things [like those predicted by AI doomers], but really the fact that misalignment can cause a lot of clear and present harms to people right now … I mean, there's a lot of decision-making that's being supported by a lot of these models these days and there's a lot of hallucinations happening. All of that really can lead to poor outcomes for people right now. [For instance], when there's an LLM involved in a decision-making process, it can really lead to, say, loans not being given to people who deserve them, or things of that nature … I think that's a big problem.”
    — Kush Varshney, an IBM Fellow who leads the company’s human-centered trustworthy AI research

    The real question is at what pace are things happening. And I think the development and deployment of increasingly advanced systems is happening at a pace that is far outstripping our ability to even understand those systems, let alone then kind of direct them in ways that we want … Right now, the systems are just not that dangerous. They're not capable enough to be that dangerous, and so it's appropriate to not be holding things up too much and not be investing too much in fixing these [alignment] problems. But it's a question of, are the companies on a trajectory to scale up those [safety] efforts as quickly as they're scaling the capabilities of their AI systems? And if they make progress as quickly as they say they will, then I don't think they're scaling up their [safety] efforts commensurately.”
    — Helen Toner, director of strategy and foundational research grants at Georgetown University’s Center for Security and Emerging Technology and a former board member of OpenAI

    Quantum Leaps

    Advances That Matter

    Real Ice volunteers and Inuit guides from the Canadian High Arctic Research Station loading water pumps in a sled. Alec Luhn

    Can geoengineering save the Arctic ice cap? Though the Arctic ice cap regrows each winter, summer sunlight heats the surrounding seawater, causing the ice to melt from the edges inward. This creates a vicious cycle: less ice to reflect sunlight and more exposed water to absorb it, which causes yet more heat and more melting. The ice cap melted to its smallest size ever recorded last summer, and scientists predict it could vanish entirely during summer months within 15 years. The result of having no white mass of ice at the pole to reflect the sun could, modeling suggests, raise global temperatures by an additional 0.19°C by 2050 according to the Potsdam Institute for Climate Impact Research. But Scientific American and the Pulitzer Center report that a British startup called Real Ice has a radical plan that it hopes can reverse this trend. The company proposes drilling through the ice cap, pumping water up from below and spraying it over the surface to thicken the ice, making it more resistant to summer melt. Proof-of-concept trials have shown some success, thickening the ice by up to 12 inches in 24 hours. The resulting slush-like snow layer also allows cold to penetrate more effectively, potentially encouraging refreezing at the base of the ice sheet. Yet the magnitude of the task at hand is daunting. Real Ice estimates it would need to cover a million square kilometers — an area the size of Texas and New Mexico combined — to preserve the ice cap. To do so, the company would need to deploy 500,000 subsea drones powered by two terawatt-hours of electricity annually — about the same as 200,000 US homes — and require a workforce of 20,000 people. Estimated annual cost: $10 billion. Add to that a long list of challenges: Dumping large volumes of brine on the ice cap’s surface could accelerate melting rather than slow it if things don’t go as planned, for instance. The impact on marine ecosystems and Indigenous communities is also unknown. Like most geoengineering proposals, it’s audacious and risky — but may soon be one of the few options left to save Arctic ice sheets.

    Europe is doubling down on bio-based industrial chemicals. Traditionally, petrochemicals are used to help make everything from cosmetics to concrete. But Chemical and Engineering News reports on several significant European efforts to turn corn, sugarcane and agricultural byproducts into greener alternatives to chemicals based on fossil fuels. CropEnergies, a large ethanol producer, is opening a facility in 2026 to produce 50,000 metric tons of ethyl acetate, used in paints and ink, from ethanol derived from biomass. Alpha Bio, meanwhile, is building a $145 million plant scheduled to open in 2027 that will convert 44,000 metric tons of plant sugars annually into polymers that can be used for water treatment and paper production. These efforts are supported by $2.2 billion from a public-private partnership between the European Union and the nonprofit, Bio-based Industries Consortium. Yet success isn't guaranteed. Regulations can make it hard for smaller companies to break into the industrial chemicals market. Political winds are also shifting in Europe, with right-wing parties, less concerned about sustainability, gaining traction. And it costs more to produce chemicals from biomass than it does with traditional approaches. One strategy for overcoming the cost disadvantage is to focus on specialty chemicals produced in smaller volumes but which command higher margins. A company called Evonik Industries, for example, is producing 10,000 metric tons per year of rhamnolipids, a new type of surfactant made from either sugarcane or corn, that can be used in soaps and other products. The bet is that customers in smaller, sustainability-conscious markets might pay a premium for cleaner chemicals. If that model is going to work anywhere right now, it’s Europe.

    Mucus is inspiring a new generation of medicines. Long viewed as a passive barrier, mucus is emerging as an active player in human health and a surprising source of inspiration for new drugs, New Scientist reports. Once thought to simply trap microbes, mucus is now understood to act as both a physical and chemical filter. Though it's mostly water, about 5 percent of mucus consists of proteins called mucins, covered in sugar chains known as glycans. These mucins form a porous mesh that can block pathogens based on size. But their effectiveness is also influenced by factors like pH and electric charge. Researchers have shown that, as a result, mucus can adapt to its environment, altering which microbes it blocks. Studies demonstrate that it can suppress numerous pathogens — including Candida albicans (which causes yeast infections), cholera-causing microbes, and strep throat bacteria — not by killing them but by trapping them in the mesh and preventing infection. That passive control is inspiring new therapies. One glycan-based compound, for instance, reduces the toxic effects of Candida albicans in mice, and researchers are looking to develop similar compounds to tackle diarrhea and lung infections with the hope of turning them into medicines. Though these treatments are experimental, researchers hope they could offer an alternative to antibiotics, with less risk of promoting resistance; at least one startup is hoping to bring such a therapy to market. Other early-stage research is exploring the role of mucus in the brain. In mice, reduced mucin levels at the blood-brain barrier have been linked to cognitive decline. Gene therapy to restore mucin production in those rodents improved memory performance, raising the possibility of new treatments for age-related brain disorders in humans. While these treatments are very much in development, they hint at a future in which mucus could help usher in a new class of treatments that disarm pathogens without destroying them.

    Long Reads

    Magazine and Journal Articles Worth Your Time

    The organ farm, from Science
    5,100 words, or about 20 minutes

    In Christiansburg, Virginia, inside a boxy facility resembling an Amazon distribution hub, work is progressing on something far more ambitious than packing boxes. Inside, its owner, a company called United Therapeutics, is developing the future of xenotransplantation: the practice of growing animal organs for transplantation into humans. The $75 million facility is where gene-edited piglets are born, screened and raised with the hope of one day providing organs such as hearts and kidneys to critically ill patients. Everything inside the facility is tightly controlled — from air quality to water purity — because these pigs, if approved by the FDA, could eventually be classified not as animals, but as living drugs. That approval is not yet here, though, and the science underpinning the approach still has major hurdles to clear. Still, as this story from Science describes, the field has moved past high-profile failures on to a string of recent breakthroughs, and growing investment by companies like United suggests xenotransplantation's moment may finally be approaching.

    The bad science behind expensive nuclear, from Works in Progress
    6,500 words, or about 26 minutes

    Since the 1950s, nuclear regulation around the world has been shaped by a scientific theory known as Linear No Threshold (LNT). The idea: that there is no safe dose of radiation, that even the smallest exposure increases cancer risk, and that the effects of radiation accumulate over time. This assumption underpins global safety standards, including the “as low as reasonably achievable” principle, a regulatory approach that often requires costly safety measures regardless of their discernable health benefits. But as this story explains, the science behind LNT has always been shaky, and its consequences enormous. From sky-high compliance costs to stalled innovation, critics argue that the theory has slowed progress and inflated public fear. US regulators are now trying to rethink how LNT influences nuclear project approvals. Yet as the piece makes clear, that effort faces stiff resistance from entrenched bureaucracies, public distrust and a scientific model that’s notoriously hard to absolutely disprove. 

    You’re not ready, from Wired
    6,900 words or about 28 minutes, across six stories

    There’s more to technological doomerism than just superintelligent AI intent on turning us into dust. In a new package of stories, Wired explores how modern technologies — from AI and quantum computing to GPS and cellular networks — could become systems for chaos if they are attacked or fail. What happens when hackers use armies of AI to automate attacks? What if quantum computing helps crooks crack encryption sooner than expected? What would mass outages of global positioning and communication systems actually mean, and how would we respond? It’s sobering reading, but that’s the point. As technologies grow more powerful and interconnected, they also expand the surface over which attacks can occur and increase the potential for ripple effects between systems when things go wrong. As protecting against risks gets harder, governments and businesses are having to design systems that can absorb shocks and recover fast. Resilience, not just defense, is becoming more important than ever.

    logo

    aventine

    About UsPodcast

    contact

    380 Lafayette St.
    New York, NY 10003
    info@aventine.org

    follow

    sign up for updates

    If you would like to subscribe to our newsletter and be kept up to date on upcoming Aventine projects, please enter your email below.

    © Aventine 2021
    Privacy Policy.