Life’s a Breach: How AI Can Expose Your Data
AI-related data exposure occurs when employees, integrations, or third-party AI tools transmit sensitive organizational data outside of approved and controlled environments. This includes pasting proprietary information into public large language models or connecting unapproved AI plugins to internal systems. It also covers the use of AI vendors that retain submitted data without contractual protections.
Specifically, the risk is not theoretical. Well-documented incidents across technology, healthcare, legal, and financial services sectors demonstrate that AI tools routinely receive sensitive data that was never intended to leave the organization. Understanding how exposure occurs and how to prevent it is now a core responsibility for IT and security leadership.
Current Examples of AI-Related Data Exposure
AI data exposure incidents are no longer rare edge cases. They are a recurring pattern across industries. Consequently, regulators, enterprise procurement teams, and compliance auditors are paying close attention to how organizations govern the use of AI tools.
The Samsung Source Code Incident
In 2023, Samsung engineers used ChatGPT to assist with debugging internal code. In the process, they submitted proprietary source code, test data, and internal meeting notes to the public AI tool. The data was transmitted to OpenAI’s servers, where it could be retained for model training purposes. Samsung subsequently banned the use of external generative AI tools for all employees.
This incident is notable for several reasons. First, the engineers were not acting maliciously. They were solving a real problem with an available tool. Second, the exposure was complete before anyone in IT or security was aware it had occurred. Third, there was no technical control in place to prevent it. Ultimately, the lesson for other organizations is that intent does not determine outcome. Sensitive data submitted to a public AI tool is outside the organization’s control the moment it leaves the endpoint.
Healthcare, Legal, and Financial Sector Patterns
Samsung is not an isolated case. Indeed, multiple healthcare organizations have used AI-assisted medical scribing tools that handled protected health information (PHI) without signed Business Associate Agreements (BAAs). Without a BAA, any AI vendor processing PHI is committing a HIPAA violation, regardless of how clinically helpful the tool is.
Similarly, attorneys at several firms have uploaded client documents to AI summarization tools to accelerate research. In most cases, the attorneys were unaware of the vendor’s data retention policies. Furthermore, financial services firms discovered employees using AI to draft client communications, submit account details, portfolio data, and internal strategies to consumer AI products. In each case, the exposure stemmed from the same root cause: employees using available tools in the absence of a governing policy.
How Sensitive Data Enters Public and Third-Party AI Tools
Data does not enter AI tools through attacks alone. Rather, most AI data exposure incidents begin with an employee making a reasonable decision with an available tool. Understanding the specific pathways is essential for building controls that actually address the problem.
Direct Input: Copy-Paste and File Upload Workflows
The most common pathway is direct input. For example, an employee copies text from a contract, a financial report, an internal proposal, or a customer record and pastes it into a public AI assistant for summarization, editing, or translation. Alternatively, they upload a file directly to an AI tool’s document-processing feature.
In both cases, the data leaves the organizational perimeter immediately and without logging. Moreover, many consumer AI tools retain submitted content by default unless the user has specifically opted out of training data collection. Most employees do not change default settings. As a result, the organization has no visibility into what was submitted, no record of the transmission, and no control over how the data is used afterward.
Integrations, Plugins, and Browser Extensions
A less visible but equally significant pathway is AI integration. Employees install browser extensions that use AI to assist with writing, research, or productivity. They connect AI tools to corporate email, calendar, or document management systems through plugin marketplaces. In other words, the AI is not just receiving what the user pastes. It is accessing everything the integration is authorized to reach.
Additionally, many AI integrations request broad permissions during setup. An AI writing assistant installed in a corporate email client may be authorized to read all messages, not just the ones the employee is actively drafting. Consequently, sensitive communications, client data, and confidential attachments can enter the AI’s processing environment without anyone realizing the scope of access granted. Vendor-supplied AI plugins embedded in productivity suites pose a similar risk if their data-handling terms are not reviewed before deployment.
Compliance Risks When AI Meets Regulated Data
For organizations operating under regulatory frameworks, the compliance implications of AI data exposure are direct and significant. Uncontrolled AI use can lead to violations across multiple frameworks simultaneously.
HIPAA, CMMC, SOC 2, and ISO 27001 Exposure Points
Under HIPAA, any vendor that receives, stores, or processes PHI must have a signed BAA. Consumer AI tools do not offer BAAs. Therefore, submitting patient information to ChatGPT, a public AI scribing tool, or an AI-powered document editor without a BAA is a HIPAA violation regardless of whether a breach occurs.
For organizations subject to CMMC 2.0 and NIST SP 800-171, the issue is equally clear. Controlled Unclassified Information (CUI) must remain within the organization’s defined system boundary. Sending CUI to an unapproved AI vendor violates access control and boundary protection requirements. Similarly, SOC 2 and ISO 27001 require documented controls over data access and third-party risk management. Importantly, an AI tool that was never vetted by a vendor security review creates a gap in both frameworks.
Retention, Anonymization, and Audit Trail Gaps
Beyond the initial transmission, AI tools create secondary compliance risks. Many platforms retain submitted data for extended periods. Some use it to improve their models. Notably, most consumer AI products do not allow organizations to delete previously submitted content or obtain a record of what was processed.
This directly conflicts with data minimization requirements under privacy regulations and audit trail requirements under frameworks like SOC 2. Furthermore, if the AI tool’s training incorporates submitted data and later surfaces similar content in responses to other users, the original data has effectively been disclosed beyond the submitting organization. Consequently, the compliance exposure extends beyond the moment of submission.
Building an Employee AI Acceptable Use Policy
Most AI data exposure incidents happen because no policy exists to prevent them. Employees are not the problem. Instead, the absence of clear expectations and approved alternatives is the problem. An AI acceptable use policy creates the structure that prevents well-intentioned use from becoming a compliance or security incident.
What to Prohibit, Permit, and Monitor
An effective policy distinguishes between prohibited behaviors, permitted tools, and behaviors that require monitoring. Prohibited behaviors should include submitting regulated data categories, such as PHI, CUI, PII, financial records, and attorney-client privileged material, to any unapproved AI tool. Similarly, installing AI browser extensions on corporate devices without IT approval should be explicitly prohibited. Using AI tools to process client data without a vendor data-processing agreement should also be prohibited.
Permitted tools should be defined explicitly rather than implied. The policy should list approved AI platforms, the data categories each tool may receive, and any restrictions on use cases. Additionally, a review and approval process for new tools should be documented so employees know how to request access to a tool rather than simply using it. Furthermore, monitoring expectations should be included. Employees should understand that the use of AI tools on corporate devices and networks is subject to review and that submissions to unapproved tools may trigger a security incident investigation.
Vendor Review and Data Handling Standards for AI Tools
Approving an AI tool requires more than evaluating its features. It requires understanding how the vendor handles the data it receives. Specifically, the following criteria should be part of every AI vendor review process.
| Evaluation Criteria | What to Ask | Why It Matters |
| Data retention | Does the vendor retain submitted content? For how long? | Retained data may be used for training or exposed in a breach |
| Training opt-out | Can the organization opt out of data being used to train models? | Consumer defaults often include training; enterprise tiers may differ |
| Data processing agreements | Will the vendor sign a DPA, BAA, or equivalent contract? | Required for HIPAA compliance and most enterprise regulatory frameworks |
| Data residency | Where is data stored and processed? US, EU, or other jurisdiction? | Affects GDPR, ITAR, and data sovereignty requirements |
| Access and deletion rights | Can submitted data be deleted on request? | Required by GDPR and recommended by NIST data minimization principles |
| Security certifications | Does the vendor hold SOC 2, ISO 27001, or equivalent? | Provides third-party evidence of security controls |
| Subprocessors | Who else processes the data? Are subprocessors disclosed? | Undisclosed subprocessors extend the data handling chain beyond the vendor |
In other words, the vendor review process for an AI tool should be the same process used for any third-party system that handles regulated or sensitive data. The fact that an AI tool is marketed as productivity software does not reduce its data handling obligations.
Practical Steps for Reducing AI Data Exposure
The following steps create a layered approach to AI data protection. They are ordered to reflect what can be implemented quickly versus what requires planning and coordination.
Immediate Actions
First, conduct an AI inventory. Identify every AI tool in use across the organization, including browser extensions, productivity plugins, API integrations, and standalone applications. Most organizations discover that employees are using more AI tools than IT is aware of. This inventory is the baseline for all subsequent decisions.
Second, implement data loss prevention (DLP) rules for AI endpoints. DLP tools can detect and block attempts to submit specific data categories, such as credit card numbers, Social Security numbers, or strings matching CUI patterns, to unapproved AI destinations. Specifically, this provides a technical control that enforces policy even when employees are unaware of the restriction. Third, apply least-privilege access controls to all approved AI integrations. Limit the data categories and system access that each AI tool is authorized to reach.
Ongoing Governance Actions
Fourth, establish a vendor review process and apply it to every AI tool before approval. Use the evaluation criteria in the section above and document the review outcome for each tool. Fifth, publish an AI acceptable use policy and train employees on it. The policy is only effective if employees know it exists and understand what it requires.
Finally, monitor AI tool usage on corporate networks and devices. Log AI-related traffic and include AI data handling in the scope of your regular security reviews. Similarly, build a process for employees to report AI tools they want to use, so that requests go through review rather than resulting in unauthorized deployment. As a result, the organization maintains visibility without blocking productivity.
Frequently Asked Questions About AI and Data Exposure
Questions About How Exposure Happens
Yes. Data exposure through AI tools does not require a breach in the traditional sense. The exposure occurs when sensitive data is submitted to an AI platform that the organization does not control. The data may be retained, used for model training, or accessible to the vendor’s staff without any malicious actor being involved. Consequently, standard breach detection tools will not identify this type of exposure because, from a technical standpoint, no unauthorized access occurred.
Not automatically. Paid subscriptions often include enterprise data protections, but the specific terms vary by vendor and subscription tier. Some enterprise plans include automatic opt-out from training data use. Others require the organization to manually enable the setting. Furthermore, a paid individual subscription is different from an enterprise agreement. An employee paying for a personal Pro subscription on a corporate device is not the same as the organization having a data processing agreement with the vendor. Specifically, legal liability and compliance obligations attach to the organization, not the individual employee.
Questions About Policy and Compliance
At a minimum, organizations should prohibit the submission of regulated data categories to unapproved AI tools. These include protected health information (PHI) under HIPAA, Controlled Unclassified Information (CUI) under CMMC and NIST SP 800-171, and personally identifiable information (PII) under applicable state privacy laws. Financial data, attorney-client privileged communications, and proprietary source code should also be prohibited. However, the specific list should reflect the organization’s regulatory environment and data classification framework.
A data processing agreement (DPA) with an AI vendor should cover several specific obligations. It should define what data the vendor may receive, how long they may retain it, and whether the data may be used for model training. Additionally, the DPA should specify the vendor’s security controls, breach notification requirements, and subprocessor disclosures. In contrast to standard terms of service, a DPA creates binding contractual obligations and gives the organization legal recourse if those obligations are violated.
How Tego Helps Organizations Govern AI and Protect Sensitive Data
AI data exposure is a governance problem first and a technical problem second. Technical controls are important, but they enforce decisions that have already been made. In other words, the policy and vendor review work must come before the tooling. Tego helps organizations make those decisions correctly: which tools to approve, how to structure policies, what vendor reviews must cover, and how to build monitoring that provides visibility without disrupting productivity.
Specifically, Tego conducts AI risk assessments that inventory current tool usage, identify data exposure pathways, and produce a prioritized remediation plan. For organizations subject to CMMC 2.0 compliance requirements or managing CUI handling obligations, Tego maps AI tool usage to the relevant control families and identifies where current practices fall outside the defined system boundary.
Furthermore, Tego develops AI acceptable use policies that reflect the organization’s specific regulatory environment and operational context. These are not generic templates. They are documents built to match the actual AI tools in use, the data categories the organization handles, and the compliance frameworks it operates under.
Tego also implements the technical controls that enforce policy decisions: DLP rules for AI endpoints, access controls on approved integrations, network monitoring for AI traffic, and vendor security reviews. For organizations wanting a baseline understanding of their current security posture before building an AI governance program, Tego’s IT Maturity Assessment (Tego Tech Check) is a practical starting point.
Start Building an AI Governance and Data Protection Program
Sensitive data is being entered into AI tools in your organization right now. The question is whether there are policies, technical controls, and vendor agreements in place to govern that exposure. Organizations that build AI governance programs proactively avoid the regulatory and reputational consequences of incidents that are otherwise entirely preventable.
Tego offers AI governance engagements and data protection assessments designed to give organizations visibility, control, and audit documentation across their AI environment. Contact Tego to schedule a conversation about where your exposure stands and what needs to change.