Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
INSIGHTS
7 min read
Share
The rush to stay competitive in the AI race means AI innovation, performance, and efficiency are largely outpacing advancements in a critical area: security.
New machine learning capabilities are fueling the development of tools like large language models (LLMs) and introducing novel cybersecurity threats to enterprises. Chief among these threats are poisoning attacks, which target training data to disrupt a model’s behavior and corrupt the reliability of its outputs. AI data poisoning is difficult to detect and rectify, but organizations can reduce their risk by putting proactive measures in place to mitigate its impact.
Data poisoning happens when a threat actor attempts to compromise the integrity of AI model data. The attacker’s intention is typically to cause the model to generate harmful, misleading, or incorrect outputs.
AI models are inherently vulnerable to this type of attack because they are trained on massive datasets. This training enables them to learn patterns and generalize knowledge to new information. However, considering the scale of data required to train AI models, it’s difficult for developers to thoroughly review all the data for signs of malicious content.
Additionally, much of the data enterprises use to train machine learning models originates from public web pages or user inputs. External actors can easily access and manipulate either of these sources. Attackers leveraged this approach when they caused Microsoft’s Tay chatbot to generate harmful content after the bot learned from racist comments and obscenities in other users’ posts.
There are several ways to approach data poisoning. Attackers may attempt to modify, remove, or add information to a training dataset. Some attacks also target specific model functionality, while others are more chaotic, manipulating anything an adversary can access. Data poisoning techniques generally fall into one of two categories.
Poisoning attacks can have a huge impact on the integrity of enterprise AI, essentially crippling a model’s ability to generate reliable outputs. Once poisoned, organizations can no longer safely rely on models for various tasks. In some cases, undetected poisoning attacks can harm businesses and downstream communities.
For instance, companies now use AI to direct autonomous vehicles and support healthcare diagnostics. AI data poisoning may cause an autonomous car to misclassify stop signs or prompt healthcare professionals to make poor treatment decisions—situations that would both cause serious public safety risks. Data poisoning attacks could also expose enterprises to legal repercussions and reputation damage. What’s more, this attack vector can derail efforts or investments in establishing a responsible AI framework.
The effects can worsen the longer an attack remains unidentified. Many AI models constantly learn from user inputs, and this continuous evolution makes it difficult to uncover signs of compromise. Even small changes relative to the volume of training data can significantly impact model performance. Once an attack is detected, it’s also challenging for developers to address since manually reviewing and correcting massive datasets is time-consuming and prone to error.
While data poisoning is generally seen as a challenge to overcome for enterprise AI, it can be used to protect copyrighted works. In 2023, a team at the University of Chicago built a data poisoning tool, Nightshade, to defend artist copyrights. These copyrights are often infringed when AI companies use original work to train models without the artists’ consent.
When artists upload their creations to Nightshade, or its sister tool, Glaze, these applications subtly manipulate pixels in the copyrighted imagery. If AI companies scrape these artworks for training, the modified pixels cause models to misinterpret the images and malfunction when generating outputs.
Despite artists seeing this manipulation as a legitimate use case, for AI developers, this form of data poisoning can necessitate combing through millions of data samples to find and remove corrupted images. According to Vitaly Shmatikov, a Cornell University professor who commented on the work, AI researchers are unaware of any robust defenses against data poisoning like this.
Because correcting all malicious data instances is extremely time-consuming and challenging, organizations often have no choice but to retrain poisoned models to eliminate erroneous behavior. Another approach is to use proactive safeguards. These methods have the benefit of avoiding service disruptions and the expense and time required for new model development.
Uncovering early signs of compromise is key to de-escalating data poisoning attacks and minimizing the spread of damage. This can be accomplished with continuous detection tools and procedures. Regularly audit models for signs of weak performance, as well as unexpectedly biased or inaccurate outputs. Auditing processes should document data throughout the AI lifecycle, including a data sample’s source, changes to the data, and user access points. This will enable you to trace an attack’s origin and investigate incidents more easily.
AI ethical hacking is also an effective way to test how models perform against malicious inputs and uncover areas for improvement. Once you’re familiar with how AI data poisoning may manifest in model behavior, educate customers and employees on these indicators to improve the chances of early detection.
Use data sanitization and validation techniques to remove potentially malicious inputs that could poison a model before performing training. Start by establishing strict guidelines and criteria for acceptable training data and ensure all datasets adhere to this standard. Consider developing AI models to automatically detect anomalies within large datasets before they’re approved for training.
Evaluate the variability of your training data. Diverse datasets—those fairly representing your desired task from a variety of perspectives—are generally harder for attackers to manipulate toward a specific outcome. Best practices in data security are also crucial for avoiding exposure. Encrypt all data at rest and in transit and enforce strict access controls for training data repositories.
In addition to training models for your core tasks, train them to recognize malicious inputs associated with a poisoning attack. This could include training a model to identify and block toxic or biased inputs like those ingested by Microsoft’s Tay chatbot.
Before using this technique—known as adversarial training—investigate how attackers are most likely to approach data poisoning with your particular use case and train for defensive strategies accordingly. Consider using techniques like defensive distillation or feature squeezing, which help build data poisoning resiliency into model behavior.
Data poisoning is one of the foremost attack vectors targeting enterprise AI. While relatively easy for adversaries to deploy, the technique is often difficult to identify and can cause extensive damage to AI systems. Unlike conventional cybersecurity threats, which often exploit code errors or insecure passwords, data poisoning is unique because it targets the very building blocks of AI technology—the data itself.
There’s no surefire way to eliminate data poisoning risk or reverse its effects once an AI system is compromised, except for retraining a new model. However, organizations can implement preventative measures and stay current with new safeguards as adversarial techniques evolve.
Mitigate the risk of data poisoning. Take practical steps to maintain AI data integrity.
Get emerging insights on innovative technology straight to your inbox.
Discover how AI assistants can revolutionize your business, from automating routine tasks and improving employee productivity to delivering personalized customer experiences and bridging the AI skills gap.
The Shift is Outshift’s exclusive newsletter.
The latest news and updates on generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.