Overview Research Convenings Prototypes Writings Talks People
Newsletter

Research
& Publications

Scientific research and evidence on AI markets and the architectures, mechanisms, and disclosures that underpin them.

Screenshot of 'The attribution crisis in LLM search results' published in Cambridge University Press's Data & Policy journal, April 2026 Data & Policy (Cambridge University Press) ↗
APR 2026
APPLIED ECONOMICS LETTERS
DOI10.1080/13504851.2026.2651435 ↗
Applied Economics Letters · Taylor & Francis
M&A as Capability Building: Patent Evidence from Big Tech
DOI 10.1080/13504851.2026.2651435 · 2026

M&A as Capability Building

Patent Evidence from Big Tech

Traditional antitrust focuses on market shares, but many Big Tech acquisitions have technological motivations that cut across distinct product markets. We advance a patent-based framework whereby Big Tech’s acquisitions aim to consolidate horizontally or diversify vertically technological capabilities through buying proven or nascent technologies — proxied by predicted patent citation potential. Linking 995 Big Tech M&A transactions (2000–2022), of which 223 yielded patents, to 114,004 patent families, we find that diversification dominates: four of five Big Tech firms acquire predominantly outside their core technology areas, consistent with ecosystem expansion. Firms vary in favouring proven vs. nascent technologies.

I. Strauss J. Yang
Cover page of Governing AI Through SEC Disclosure

Governing AI Through SEC Disclosure

Drawing on over 7,800 8-K filings on AI by companies, we show that around two-thirds are overwhelmingly positive in nature and avoid ‘negative’ news. Drawing on the SEC’s 2023 cybersecurity reporting rule, we propose a materiality-first AI disclosure regime involving: (1) SEC guidance clarifying what a ‘material’ AI risk is; (2) a dedicated AI-incident item on the 8-K form; (3) a standing section in the annual 10-K form on AI strategy, governance, risk, and dependencies; and (4) SEC enforcement against AI-washing and other violations.

I. Strauss T. O'Reilly S. Rosenblat I. Moore
Policy & Regulation
Open Protocols Can Prevent AI Monopolies

Protocols and Power

Can we head off AI monopolies before they harden? As AI models become commoditized, incumbent Big Tech platforms are racing to rebuild their moats at the application layer, around context. We argue that open protocols — exemplified by Anthropic’s Model Context Protocol (MCP) — serve as a powerful rulebook, helping to keep API-exposed context fluid and to prevent Big Tech from using data lock-in to extend their monopoly power. To support the efficacy of MCP and open protocols we recommend authorized access to user-owned platform data, portable agentic memory independent of any single application, and rules limiting how AI services exploit user context.

M. Moure T. O'Reilly I. Strauss
AUG 2025
TOWARDS DATA SCIENCE
SSRC DOI10.35650/AIDP.4119.d.2025 ↗
Towards Data Science hero illustration for MCP in Practice

MCP In Practice

We provide evidence from 2,874 MCP servers that the ecosystem is already concentrating around a small number of high-use servers, with most activity clustered in software engineering, web automation, and database/search use cases. The policy challenge is therefore not only to promote protocol adoption, but to ensure that MCP remains contestable: with open APIs, portable memory, transparent dependencies, fallback access routes, and safeguards governing how agents read from and act on user context.

S. Rosenblat I. Strauss T. O'Reilly I. Moore
JUN 2025
DATA & POLICY (CAMBRIDGE UNIVERSITY PRESS)
DOI10.1017/dap.2026.10064 ↗
Data & Policy (Cambridge University Press) journal cover for The Attribution Crisis: LLM Search Results

The Attribution Crisis in LLM Search Results

Estimating Ecosystem Exploitation

Drawing on approximately 14,000 real-world LMArena conversation logs with search-enabled LLMs, we document the ‘attribution gap’ — the difference between the web pages an LLM consults and those it credits. We find that 34% of Google Gemini and 24% of GPT-4o responses are generated without fetching online content; Gemini provides no clickable citation in 92% of answers; and Perplexity Sonar visits roughly ten relevant pages per query but cites only three or four. Citation-efficiency estimates of 0.19–0.45 across systems show this is a design choice, not a technical limit, and we propose standardised retrieval telemetry and full disclosure of search-and-citation traces as a remedy.

I. Strauss J. Yang T. O'Reilly S. Rosenblat M. Moure
APR 2025
FORTHCOMING IN IEEE ACCESS JOURNAL
SSRC DOI10.35650/AIDP.4112.d.2025 ↗
arXiv first page of Real-World Gaps in AI Governance Research

Real-World Gaps in AI Governance Research

Drawing on 1,178 safety and reliability papers within a corpus of 9,439 generative-AI papers (January 2020 – March 2025), we compare the research outputs of leading AI companies (Anthropic, Google DeepMind, Meta, Microsoft, OpenAI) with leading AI universities (CMU, MIT, NYU, Stanford, UC Berkeley, University of Washington). Corporate research increasingly concentrates on pre-deployment topics — alignment, testing, and evaluation — while attention to deployment-stage issues such as bias, hallucinations, copyright, and persuasive or addictive design has waned. We recommend expanding external-researcher access to deployment data and building systematic observability of in-market AI behaviour.

I. Strauss I. Moore T. O'Reilly S. Rosenblat
APR 2025
FORTHCOMING IN AI & ETHICS JOURNAL
SSRC DOI10.35650/AIDP.4111.d.2025 ↗
arXiv first page of Beyond Public Access in LLM Pre-Training Data

Beyond Public Access in LLM Pre-Training Data

Using a legally obtained dataset of 34 copyrighted O'Reilly Media books, we apply the DE-COP membership-inference attack to test whether OpenAI's language models recognise pay-walled book content. GPT-4o exhibits recognition patterns consistent with prior exposure (AUROC 0.82, 95% bootstrapped CI 0.60–0.96), while the smaller GPT-4o Mini does not (AUROC 0.56, CI 0.28–0.83). Though sample-limited, the result makes a tractable case for mandatory disclosure of pre-training data sources and formal licensing frameworks for AI training content.

S. Rosenblat T. O'Reilly I. Strauss

See our earlier work with Prof. Mariana Mazzucato at the UCL Institute for Innovation and Public Purpose on Algorithmic Attention Rents ↗