AI Chatbots Lose Safety Awareness During Long Conversations

Cisco researchers discovered that artificial intelligence systems forget safety rules during extended interactions, making them more likely to share dangerous or inappropriate information. The report revealed that a few simple prompts can override most security barriers in popular AI tools.

Cisco tested large language models from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft. Researchers held 499 conversations using a “multi-turn attack” method, where users asked a series of five to ten questions to gradually bypass safeguards.

They found that 64% of multi-question exchanges produced harmful or unsafe responses, compared to only 13% when researchers asked a single question. The models’ success in resisting manipulation varied widely — from 26% with Google’s Gemma to just 7% resistance from Mistral’s Large Instruct, meaning it leaked risky content in 93% of trials.

Long Dialogues Let Attackers Slip Through Guardrails

The report warned that repeated questioning allows attackers to refine prompts and skirt restrictions undetected. AI chatbots often lose their ability to enforce safety policies over time, which could help hackers extract confidential data or spread misinformation.

Cisco explained that open-weight language models — like those developed by Meta, Google, Mistral, OpenAI, and Microsoft — provide public access to their training safety parameters. These models carry fewer built-in restrictions, shifting responsibility for protection to users who customize or deploy them.

Researchers said this flexibility encourages innovation but also increases risk, as malicious actors can adapt open-source versions for harmful use. Cisco urged AI companies to strengthen systems that maintain safety consistency throughout long conversations.

Tech Firms Face Renewed Pressure Over AI Misuse

Google, OpenAI, Meta, and Microsoft claim they have improved safeguards against harmful fine-tuning, yet experts say current defenses remain weak. The report renewed scrutiny over how easily people can repurpose AI tools for criminal activities.

Anthropic recently admitted that cybercriminals exploited its Claude model to steal and ransom personal data, demanding payments exceeding $500,000. Cisco’s findings suggest that without stricter regulation and improved self-monitoring, AI systems could increasingly serve as tools for fraud, hacking, and manipulation.

The company concluded that protecting AI from “forgetting” its safety measures must become an industry priority before long-term use turns these systems into security liabilities.

AI Chatbots Lose Safety Awareness During Long Conversations

Tragedy Strikes Northern B.C. Community as School Shooting Claims Lives

Maxwell Refuses to Testify, Ties Clemency to Epstein Probe

EV Slowdown Forces ACC to Pull Plug on Major Battery Projects

US Clean Energy Growth Hits Record High Update Now

AI medical diagnosis tools save lives in clinics

Chinese Short Drama Expansion Hits Global Market

China Premier Boosts Australia Trade Ties

Meta faces investigation over AI chats with children

AI Assistant for Astronaut Health

Swatch Withdraws Controversial Ad After Accusations of Racism in China

Researchers unlock microbial secret behind fine chocolate

CATEGORIES

IMPORTANT LINKS

AI Chatbots Lose Safety Awareness During Long Conversations

Long Dialogues Let Attackers Slip Through Guardrails

Tech Firms Face Renewed Pressure Over AI Misuse

Related Posts

CATEGORIES

IMPORTANT LINKS