Privacy and AI in the workplace: a critical and costly risk for directors

The privacy risk of generative AI in the workplace is a ticking time-bomb, however early bomb defusal tools are available for directors willing to confront the issue head-on.

While the risks are real, pressing and significant, approaching the issue with rigour now creates a huge opportunity to become a leader in both your industry and nationally. Boards can leverage that to move forward confidently with using AI to increase productivity and explore areas for expansion.

Our legal experts in this area have an understanding of the models, privacy issues and new risks that crop up on a weekly or even daily basis. We can assist you with a robust AI Use model and advice.

Background

The Cyber Security Strategy 2026–2030 and accompanying Action Plan 2026–2027, published on 27 February 2026, expressly flagged a new civil pecuniary penalty regime under the Privacy Act 2020 (PA 2020). The consultation paper recommends director-level penalties of up to NZ$500,000 for those who negligently, recklessly or knowingly fail to meet minimum cyber security requirements.

Aside from personal penalties, companies face the prospect of class-action claims by the potentially thousands of people impacted by any data breach that compromises their personal information.

While the emerging privacy and security risks that come with generative AI are alarming, our view is that:

This is an opportunity to be an industry leader and establish a responsible and effective AI privacy model that gives your business an edge to both prevent and effectively respond to data breaches.
Gaining that edge requires an understanding of how AI models are built and work in practice. We recommend achieving this through a mixture of personal learning – of the basics – and using external experts to work with you on policies, cyber security tools and education for your workforce.

This article focuses on building privacy protection at an early stage. Our next article in this series will focus on responding to a data breach.

1. How a generative AI model works, in plain terms

Tokens and vectors: putting words on a giant map

Inside a generative AI model, every word (or part of a word) is broken into a unit called a token. Each token is given a long list of numbers called a vector. Think of the vector as a set of map coordinates. A simple two-dimensional map needs only two numbers, for example [3, 6] while modern models use thousands of dimensions so each token sits at a position in a vast high-dimensional space. The clever part is that words with similar meanings end up close together on this map. To be overly simplistic: “Doctor,” “nurse” and “hospital” cluster in one neighbourhood; “yacht,” “sail” and “harbour” in another.

Training: a billion rounds of fill-in-the-blank

How did the words end up in the right neighbourhoods? The model is trained on enormous quantities of text by playing a kind of fill-in-the-blank. It reads a passage, predicts the next word, checks the actual answer, and adjusts its internal numbers to end up a little closer next time. Repeat this billions of times across an enormous database and you produce a system in which the patterns of human language, knowledge and reasoning are baked-in.

Generation: the words “talk”; the model picks the next one

When a user enters a prompt, the words in the prompt influence one another to establish context (engineers call this mechanism “attention”). To give a simplistic example, the word “planted” means something different in the sentence “The CIA planted a spy” and “Jack planted a beanstalk”. The model figures out the context and the importance of these words to one another. It then runs the contextualised prompt through dozens of mathematical layers and produces a probability score for every possible next word. It usually picks one of the highest-scoring words – not always the very top one, because a small element of randomness (called “temperature”) is built in to keep responses from being repetitive. The chosen word is added to the answer, the context recalculated, and the cycle repeats – word, by word, by word.

The key insight for directors: The model is not picking words at random, but it is also not verifying facts. It is selecting from the words it considers most probable (with some variety) given everything that came before. After training on a vast amount of human knowledge, “most probable” and “accurate” overlap a surprising amount of the time – but not always. When a plausible-sounding wrong answer fits the pattern better than the truth, the model will produce the wrong answer, confidently. This is the technical root of “hallucination” and it is also the technical root of several Privacy Act risks set out below.

2. Where prompts and outputs are stored – and what the PA 2020 says about it

Every prompt and every output passes through, and is generally logged on, the provider’s servers. But what then happens to those prompts and outputs follows two quite different pathways, and the difference matters.

First pathway: training

The first pathway is what most people think of when they hear the words “AI” and “privacy” in the same sentence: the provider uses the inputted prompts to further train its model. The setting that controls this is usually labelled something like “allow my data to be used to improve the model” and on consumer-tier accounts – free, Plus, Pro, individual paid plans of ChatGPT and Claude – it is generally on by default.

When training is on, user’s words don’t get stored as a readable copy in a database; they get absorbed into the model’s parameters – the billions of numbers we described in Section 1 – as statistical patterns. No human reads them in the ordinary course. The risk is downstream: a future user, asking the model an unrelated question, may receive an output in which the model regurgitates a memorised fragment of what the user typed. That is how a confidential client passage entered into a public chatbot can end up surfacing, near-verbatim, in someone else’s answer six months later.

Second pathway: access

The second pathway is underestimated and not effectively managed by simply using an Enterprise model. Whether or not training is enabled, the prompt and the output are typically stored as readable text on the provider’s infrastructure for a retention period – commonly 30 days to 24 months – in logs maintained for performance monitoring, safety, analytics and legal compliance. These logs are searchable, in plain text, by real human beings: the provider’s engineers, trust-and-safety reviewers, legal teams, and – when ordered – external counsel, regulators and law-enforcement officers.

The distinction in one sentence: Switching off “train on my data” stops user words being baked into the model’s parameters. It does not stop those words being stored in plain text on the provider’s servers where staff and (in some cases, litigants) can read them. Two different risks, two different settings, and most enterprise contracts address only the first.

What providers actually retain on the access pathway

Logs commonly include the full text of prompts, the model’s outputs, and metadata such as account details, timestamps, device identifiers and IP addresses. Retention periods vary by provider and tier. The provider’s staff can search those logs during the retention window. A “temporary” or “no history” mode usually means the chat is hidden from the user’s visible history. It does not necessarily mean the data is purged from the provider’s systems.

Discovery, warrants and the absence of “AI privilege”

AI chat logs are increasingly treated by overseas courts as electronically stored information subject to discovery in litigation, regulatory investigation and criminal prosecution. AI is not a lawyer and asking it for legal advice does not automatically mean that the user prompt is privileged, so in many cases these prompts and outputs will be disclosable when an individual submits a privacy information request.

The working assumption directors should adopt: Treat every AI interaction as potentially discoverable. If you are using it as note gathering for the purpose of obtaining legal advice – document that clearly and then actually provide it to a lawyer for the purpose of obtaining legal advice.

4. Practical solutions: what Boards should require

Risk-by-risk response

This table borrows in-part from the Privacy Commissioner’s guidance. It is far from exhaustive. It is critical for you to develop and refine an AI Use Policy. We can work with you to develop a tailored, detailed and robust model for your business, or review your existing policy. Cyber security entities we link up with like Aura Information Security (part of Kordia) should be involved here too.

Privacy risk	Practical solution
Inaccurate or biased outputs (IPP 8) (Hallucinations, perpetuated errors and overseas-trained bias against Māori, Pacific peoples and other groups.)	Perform a Privacy Impact Assessment before any AI tool that touches personal information goes live.
	Ask probing questions of AI providers on their testing and audits of pre-deployment bias.
	Most importantly – educate your workforce on the limitations of AI and the privacy risks. Employees will be both more proficient in the use of AI and commit fewer privacy breaches if they have a greater technological literacy.
	Your policy should include prescribed parameters on search terms, for example including a caveat in each prompt regarding limits on retrieval of information from private accounts when the model tacks on a web-search.
Training pathway: memorisation and regurgitation (IPPs 5, 10, 11) (Inputs absorbed into model parameters; surfacing in another user’s output.)	Use enterprise instances that contractually exclude inputs from training, or switch off the “train on my data” setting on consumer accounts. Prohibit identifiable personal information, client material and trade secrets in consumer-tier tools. Where data must be used, de-identify before input.
Access pathway: storage, retention and foreign legal process (IPPs 5, 9, 12) (Provider logs are searchable in plain text and may be subject to overseas legislation, like the US CLOUD Act, foreign warrants and preservation orders.)	Map data flows for every approved AI tool. Prefer providers with contractual retention/deletion guarantees and zero-data-retention options.
	Require a documented IPP 12 assessment in every PIA.
	If you want your search to be privileged – document that clearly and then actually provide it to a lawyer for the purpose of obtaining legal advice.
Indirect collection (new IPP 3A, from 1 May 2026) (Information about a person that arrives via an AI tool, agency or third-party feed.)	Update privacy notices and employment agreements to disclose AI use, monitoring, indirect collection sources and biometric processing. Refresh recruitment and HR collection statements.
Access and correction (IPPs 6, 7) (Employees can ask to see and correct AI-generated assessments.)	Keep an audit trail for every AI-assisted decision: tool used, data in, output, human reviewer, reasoning.
	Build a correction pathway and an appeal route to a human decision-maker. Be prepared to append a correction notice to information.
Shadow AI (Staff using unapproved consumer AI tools, often unknown to the Board, with the organisation still the agency under the Privacy Act.)	Have the AI steering committee own a clear AI use policy: approved tools, prohibited inputs, mandatory human oversight for significant decisions, an appeal mechanism, and a stated position on whether unapproved tools are blocklisted or just monitored.
	Provide a sanctioned enterprise tool so staff have a safe alternative, and pair it with short, scenario-based training.
	Make your approved AI model as accessible as you can, so your staff do not take an easier route with free models on their phones.
Biometric processing (Facial recognition, voiceprints, fingerprint scanning for time-and-attendance, security or access control.)	Comply with the Biometric Processing Privacy Code. Existing processing must be brought into compliance by 3 August 2026; new processing has been required to comply since 3 November 2025.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

About Us

Our Services

Careers

Privacy and AI in the workplace: a critical and costly risk for directors

Background

1. How a generative AI model works, in plain terms

Tokens and vectors: putting words on a giant map

Training: a billion rounds of fill-in-the-blank

Generation: the words “talk”; the model picks the next one

2. Where prompts and outputs are stored – and what the PA 2020 says about it

First pathway: training

Second pathway: access

What providers actually retain on the access pathway

Discovery, warrants and the absence of “AI privilege”

4. Practical solutions: what Boards should require

Risk-by-risk response

Meet the team that makes
things simple.

Let's Talk

Privacy and AI in the workplace: a critical and costly risk for directors

Background

1. How a generative AI model works, in plain terms

Tokens and vectors: putting words on a giant map

Training: a billion rounds of fill-in-the-blank

Generation: the words “talk”; the model picks the next one

2. Where prompts and outputs are stored – and what the PA 2020 says about it

First pathway: training

Second pathway: access

What providers actually retain on the access pathway

Discovery, warrants and the absence of “AI privilege”

4. Practical solutions: what Boards should require

Risk-by-risk response

Meet the team that makes things simple.

Let's Talk

Meet the team that makes
things simple.