Skip to main content

D3 — Intellectual Property & Copyright

Medium severityEU AI Act Art. 53Australian Copyright Act 1968NIST AI 600-1EU DSM Art. 4

Domain: D — Data | Jurisdiction: AU, EU, US, Global


Layer 1 — Executive card

AI systems trained on publicly available content may reproduce copyrighted material, and AI-generated code may carry licence contamination risks.

AI-generated content may reproduce copyrighted material without attribution or licence. AI-generated code may include GPL-licensed segments that contaminate proprietary codebases. Multiple class-action lawsuits have been filed against AI providers (OpenAI, Google, Stability AI, GitHub Copilot) for training on copyrighted content without authorisation. The organisational risk operates at two levels: outbound (content you publish) and inbound (your IP submitted to external AI tools).

Do we have human review and licence scanning in place for AI-generated content and code before it is used in production or published externally?

AI-generated content or code may reproduce copyrighted material. GPL contamination in AI-generated code is a particular risk for proprietary codebases. The audit finding means AI-generated outputs are not currently reviewed for IP compliance before use.


Layer 2 — Practitioner overview

Likelihood drivers

  • AI-generated content published without human review for third-party reproduction
  • AI tools used for code generation without licence compatibility checking
  • Organisational IP submitted to external AI tools without enterprise data protection terms
  • No policy on AI-generated content documentation

Consequence types

TypeExample
IP infringement claimCopyright holders claim reproduction in AI-generated outputs
Licence contaminationGPL-licensed segments in proprietary codebase
IP lossProprietary content submitted to external AI training pipelines
Reputational harmPublic IP infringement findings

Affected functions

Legal · Technology · Compliance · Marketing · Research

Controls summary

ControlOwnerEffortGo-live?Definition of done
Output review policy for AI-generated contentLegalLowRequiredPolicy requires human review of AI-generated content before publication checking for verbatim reproduction. Documented and communicated.
Code licence scanningTechnologyLowRequiredAI-generated code scanned for licence-incompatible segments before inclusion in production codebase. Integrated into CI/CD or code review process.
AI-generated content labellingTechnologyLowPost-launchInternal records maintained identifying AI-generated content. Labelling applied before content enters production or is published.
Vendor training data documentationProcurementLowRequiredAI vendors have provided documentation on training data provenance and copyright compliance per EU AI Act Art. 53(1)(d).

Layer 3 — Controls detail

D3-001 — Output review policy for AI-generated content

Owner: Legal | Type: Preventive | Effort: Low | Go-live required: Yes

Establish and enforce a policy requiring human review of AI-generated content before it is published externally, submitted to regulators, or used in client-facing materials. The policy must be operational — integrated into publishing and sign-off workflows — not simply a document that exists. The most common failure mode is not the absence of a policy but the absence of a mechanism that makes compliance with the policy the path of least resistance.

Implementation requirements: (1) Policy scope — define what constitutes "AI-generated content" for policy purposes. Scope must cover: text generated by generative AI tools (including Microsoft Copilot, ChatGPT, Google Gemini, internal LLM deployments); AI-assisted drafting where AI produced substantial portions; code generated by AI coding assistants; (2) Review requirements — the human review must check for: verbatim or near-verbatim reproduction of third-party material (articles, books, reports, code); fabricated citations or sources (hallucination risk — see A1); content that may infringe registered trademarks; (3) Workflow integration — embed the review requirement as a mandatory step in publishing, legal sign-off, and code review workflows. The reviewer must explicitly attest that IP review was completed. Where publishing systems are used, consider a mandatory field or checklist; (4) Documentation — retain a record of what content was AI-generated and that review was completed. Minimum retention: 7 years for regulatory correspondence, contract documents, and financial materials; (5) communicate the policy to all staff with access to AI generation tools and include it in AI acceptable use training.

Jurisdiction notes: AU — Copyright Act 1968 — reproduction of a substantial part of a copyright work without licence or exception constitutes infringement; the test is qualitative (importance of the reproduced portion) not just quantitative | EU — EU AI Act Art. 53(1)(d) — providers of general purpose AI models must publish a sufficiently detailed summary of training data used; deployers must comply with copyright law. EU DSM Directive Art. 4 — text and data mining exception applies to research but commercial AI use must comply; Art. 17 — platforms may have additional obligations | US — copyright in AI outputs is unsettled; US Copyright Office (2023, 2024) guidance — AI-generated content without sufficient human authorship is not copyrightable, creating both a risk (your AI content may not be protectable) and an obligation (you cannot claim copyright in reproduced AI output)


D3-002 — Code licence scanning

Owner: Technology | Type: Preventive | Effort: Low | Go-live required: Yes

AI coding assistants (GitHub Copilot, Amazon CodeWhisperer, Cursor, and similar) generate code trained on public repositories. A portion of that training data is GPL, LGPL, AGPL, or other copyleft-licensed code. When copyleft-licensed segments are included in a proprietary codebase without compliance with licence terms, it creates a licence contamination obligation — in the most serious cases requiring the entire codebase incorporating the contaminated module to be disclosed under the copyleft licence.

Implementation requirements: (1) Scan requirement — all AI-generated code must be scanned for licence-incompatible segments before inclusion in production codebases. This applies to code generated by any AI tool, including tools marketed as enterprise-safe; (2) Tooling — integrate a software composition analysis (SCA) tool capable of detecting copied or near-matched code against known open-source repositories. Tools include FOSSA, Black Duck, Snyk Open Source, and licensee. Copilot-specific: GitHub Advanced Security includes a code referencing feature that identifies when Copilot suggestions match public code; enable this; (3) Integration point — scanning should occur as a step in the code review process and optionally as a CI/CD gate. The gate should block merge of any code with a critical licence finding (copyleft contamination) pending legal review; (4) Finding disposition — for each finding: categorise the licence (copyleft, permissive, proprietary); assess the contamination risk; either rewrite the identified segment without the AI tool, seek legal review for licence compliance options, or accept with documented rationale; (5) Maintain a scanning log — retain scan results for each release. This provides a defence in any licence dispute.

Jurisdiction notes: AU — Copyright Act 1968 — open-source licences are legally binding; violation of GPL terms constitutes copyright infringement. FOSS licence compliance is a contractual obligation, not merely a technical preference | EU — EU AI Act Art. 53 — GPAI model providers must comply with EU copyright law, including transparency obligations where training included copyrighted works. Software Directive 2009/24/EC — copyright in computer programs is strongly protected in EU | US — GPL enforcement has been tested in US courts (Software Freedom Conservancy litigation); copyleft contamination creates real legal exposure. Oracle v. Google established that copyright in software APIs is enforceable, setting a precedent for strict software copyright interpretation


D3-003 — AI-generated content labelling

Owner: Technology | Type: Detective | Effort: Low | Go-live required: No (post-launch)

Maintain internal records identifying what content in production has been AI-generated. This is both a governance requirement and a practical necessity — without labelling, the organisation cannot systematically conduct IP review, respond to regulatory inquiries, or audit compliance with its own acceptable use policy.

Implementation requirements: (1) Internal labelling system — implement metadata tagging for AI-generated content within content management systems, document repositories, and code repositories. At minimum: flag that AI was used, record which tool, record the date; (2) Code commits — adopt a convention for commit messages or pull request descriptions identifying AI-assisted code generation. Example: [AI-ASSISTED] tag in commit subject; (3) Document metadata — for documents managed in SharePoint, Confluence, or equivalent: add a required metadata field for AI assistance status. Options: None / AI-assisted (human primary) / AI-generated (human reviewed); (4) External disclosure — assess where external disclosure of AI-generated content is required or emerging as expectation. Regulatory submissions in some jurisdictions are beginning to expect disclosure. Do not conflate internal labelling (governance) with external disclosure (regulatory/reputational decision); (5) Periodic audit — conduct quarterly spot-checks of published content to assess labelling compliance. Report results to the AI governance committee.

Jurisdiction notes: AU — no mandatory external disclosure requirement for AI-generated content as of 2026; OAIC has noted AI-generated content in privacy policies and regulatory submissions as an emerging area | EU — EU AI Act Art. 50 — AI systems generating synthetic content (images, audio, video, text) must ensure outputs are marked in machine-readable format (C2PA or equivalent); applies from August 2, 2026 for GPAI-generated synthetic media | US — FTC has signalled that undisclosed AI-generated content in consumer-facing contexts may constitute deceptive practice under Section 5 FTC Act; disclosure expectations are evolving rapidly


D3-004 — Vendor training data documentation

Owner: Procurement | Type: Preventive | Effort: Low | Go-live required: Yes

When procuring AI tools or models from vendors, require documentation of training data provenance and copyright compliance as a condition of procurement. This is the organisation's primary mechanism for managing inbound IP risk from vendor-supplied AI — the organisation is a deployer and may share liability for IP infringement in outputs if it cannot demonstrate due diligence in vendor selection.

Implementation requirements: (1) Procurement checklist — add to the standard vendor AI due diligence questionnaire: (a) what data was used to train the model; (b) how was copyright compliance assured for training data; (c) does the vendor provide indemnification for IP claims arising from model outputs; (d) what opt-out or exclusion mechanisms exist for copyright holders; (e) is training data documentation available under NDA for audit purposes; (2) Indemnification clause — for AI tools where outputs will be used in commercial products or client deliverables, negotiate an IP indemnification clause in the contract. Assess whether the vendor's standard terms include this — many enterprise AI vendors (Microsoft Copilot, Google, Adobe) have introduced copyright indemnification programmes with conditions; (3) EU AI Act Art. 53 compliance — for GPAI model providers, Art. 53(1)(d) requires a publicly available summary of training data used for copyright law compliance. Verify this is available for each GPAI model procured; (4) Annual review — training data documentation becomes stale as vendors update their models. Include a trigger in vendor contracts requiring notification of material training data changes and an annual review of compliance documentation.

Jurisdiction notes: AU — Australian Copyright Act 1968 — no specific text and data mining exception for commercial AI training; vendors training on Australian-published content without licence or exception are potentially infringing; deployers procuring such models may have secondary liability | EU — EU AI Act Art. 53(1)(d) — mandatory for GPAI providers to publish copyright compliance summaries. EU DSM Art. 4 — text and data mining exception for research does not extend to commercial AI training | US — Authors Guild v. OpenAI and related cases (2023–2025) are defining the US framework; until settled, deployers should seek indemnification rather than assume fair use applies to all vendor training data


KPIs

MetricTargetFrequency
AI-generated content published without documented reviewZeroMonitored continuously
AI-generated code with critical licence findings unresolvedZero at production mergeTracked per release
AI tools with training data documentation on file100% of procured AI toolsAudited annually
IP indemnification clause in place100% of commercial AI tool contractsReviewed at procurement and annual renewal
AI content labelling compliance (spot-check)> 95% of sampled content correctly labelledQuarterly audit

Layer 4 — Technical implementation

Licence scanning — CI/CD integration

import subprocess
import json
from dataclasses import dataclass
from typing import Literal

LicenceRisk = Literal["critical", "high", "medium", "low", "none"]

# Licence categories — risk level for inclusion in a proprietary codebase
LICENCE_RISK_MAP = {
# Critical — copyleft, strong contamination risk
"GPL-2.0": "critical",
"GPL-3.0": "critical",
"AGPL-3.0": "critical",
# High — weak copyleft, requires licence compliance steps
"LGPL-2.1": "high",
"LGPL-3.0": "high",
"MPL-2.0": "high",
"EUPL-1.2": "high",
# Medium — permissive with attribution requirements
"Apache-2.0": "medium",
"BSD-2-Clause": "medium",
"BSD-3-Clause": "medium",
"MIT": "low",
# Low / None — permissive, minimal restrictions
"ISC": "low",
"CC0-1.0": "none",
"Unlicense": "none",
}

@dataclass
class LicenceFinding:
file_path: str
matched_package: str
matched_licence: str
risk_level: LicenceRisk
match_confidence: float # 0.0 – 1.0
recommendation: str

def scan_ai_generated_code(
file_paths: list[str],
sca_tool: str = "licensee",
) -> dict:
"""
Run SCA licence scan on AI-generated code files.
Integrate into CI/CD as a blocking gate for critical findings.

Returns findings dict with go/no-go recommendation.
"""
findings = []
for path in file_paths:
# In production: call your SCA tool CLI and parse output
# Example with FOSSA CLI:
# result = subprocess.run(
# ["fossa", "analyze", "--only-target", path, "--format", "json"],
# capture_output=True, text=True
# )
# findings.extend(parse_fossa_output(result.stdout))
pass

critical = [f for f in findings if f.risk_level == "critical"]
high = [f for f in findings if f.risk_level == "high"]

return {
"findings": findings,
"critical_count": len(critical),
"high_count": len(high),
"gate_passed": len(critical) == 0,
"recommendation": (
"BLOCK: Critical copyleft findings require legal review before merge."
if critical
else "WARN: High-risk licence findings — legal review recommended."
if high
else "PASS: No critical licence findings detected."
),
"critical_findings": critical,
}

Output IP review checklist — automation assist

import re

# Patterns indicative of verbatim reproduction risk
# These are heuristic triggers for human review — not definitive detection
REPRODUCTION_INDICATORS = [
r"(?i)copyright\s+\(c\)", # explicit copyright notice
r"(?i)all rights reserved",
r"(?i)licensed under",
r"(?i)reprinted with permission",
r"(?i)source:\s+\w+", # inline attribution (may signal copied text)
]

def flag_for_ip_review(content: str) -> dict:
"""
Heuristic scan of AI-generated content for reproduction indicators.
Flags for human review — does not determine infringement.
"""
flags = []
for pattern in REPRODUCTION_INDICATORS:
matches = re.findall(pattern, content)
if matches:
flags.append({
"pattern": pattern,
"matches": matches[:3], # sample only
})

# Check for unusually long verbatim-looking passages (> 200 chars without punctuation variation)
long_segments = [
seg for seg in re.split(r'[.!?]\s+', content)
if len(seg) > 200 and seg.count(',') < 3
]

return {
"reproduction_indicators": flags,
"long_segments_flagged": len(long_segments),
"requires_human_review": len(flags) > 0 or len(long_segments) > 0,
"review_priority": "high" if len(flags) > 0 else "standard",
}

Vendor AI due diligence questionnaire — schema

@dataclass
class VendorAIDueDiligence:
vendor_name: str
product_name: str
assessment_date: str
assessor: str

# Training data
training_data_described: bool
training_data_summary_url: str # EU AI Act Art. 53(1)(d) — public summary
copyright_compliance_assurance: str # vendor's stated approach

# Indemnification
ip_indemnification_offered: bool
indemnification_conditions: list[str]
indemnification_exclusions: list[str]

# Opt-out mechanisms
copyright_holder_optout_available: bool
optout_mechanism_description: str

# Enterprise data protection
data_not_used_for_training: bool # confirms submitted data not used to train model
enterprise_terms_in_place: bool
enterprise_terms_reference: str

# Assessment outcome
procurement_approved: bool
conditions: list[str] = field(default_factory=list)
legal_review_completed: bool = False
legal_reviewer: str = ""
notes: str = ""

Compliance implementation

Australia: Copyright Act 1968 — no text and data mining exception exists in Australian law for commercial AI training or output use. Reproduction of a substantial part of a copyright work without a licence, exception, or fair dealing defence constitutes infringement. The fair dealing provisions (research, criticism, review, news reporting) are narrow and unlikely to cover routine commercial AI use of copyrighted content. Code is protected as a literary work. The Australian Law Reform Commission has recommended reform but no legislation has passed as of 2026 — operate on current law. For AI-generated output: the question of whether an AI can be an author is unsettled in Australian law; caution is warranted about IP claims in AI-generated content.

EU: EU AI Act Art. 53 — providers of GPAI models with systemic risk must publish copyright compliance summaries of training data; deployers of such models must satisfy themselves of compliance. EU DSM Directive Art. 4 — text and data mining exception permits extraction from lawfully accessed content for research; Art. 4(3) — rights holders may opt out for commercial use, and AI vendors using content whose rights holders have opted out are in breach. C2PA (Coalition for Content Provenance and Authenticity) standard for content watermarking is emerging as the EU implementation mechanism for Art. 50 synthetic content disclosure requirements.

US: The US copyright framework for AI remains the most contested globally. Key developments as of 2026: US Copyright Office (2023, 2024) — AI-generated content without human authorship elements is not copyrightable; human selection, arrangement, and creative expression can support copyright in AI-assisted works. Authors Guild v. OpenAI and related cases are testing whether training on copyrighted works constitutes fair use — outcomes will significantly affect the US risk profile. DMCA safe harbour does not apply to training data infringement. For enterprise risk management: seek IP indemnification from AI vendors; implement output review; do not rely on assumed fair use until courts settle the question.


Incident examples

Class-action lawsuits against AI providers (2023–2025): Multiple class-action lawsuits filed against OpenAI, Google, Stability AI, and GitHub Copilot for training on copyrighted content without authorisation. Cases ongoing and establishing the legal framework for AI IP liability.

GPL contamination via code generation (documented risk): AI-generated code found to include segments from GPL-licensed open-source projects, creating licensing obligations for proprietary codebases when such code is included without licence review.


Scenario seed

Context: A development team uses an AI coding assistant to accelerate development. Code is reviewed for functionality but not licence compatibility.

Trigger: A legal audit flags a function in the production codebase that is nearly identical to GPL-licensed open-source code.

Difficulty: Foundational | Jurisdictions: AU, EU, US

▶ Play this scenario in the AI Risk Training Module — AI-Generated Code & Licence Contamination, four personas, ~11 minutes.