SAG-AFTRA expands likeness enforcement into multimodal scope
·
SINGAPORE launches sovereign multimodal compute program
·
CAPITAL multimodal infra rounds overtake LLM-only
·
DINNER 04/27 thirteen at table · three convictions sharpened
·
AGI HOUSE
VOL. 01 · NO. 01 · MAY 2026
Biweekly Intelligence Briefing · Issue 01
The Merging of Modalities.
Image, video, text, and world-knowledge are collapsing into single general-purpose models. What that means for the frontier, the application layer, and the institutions absorbing the shift — ahead of the DeepMind & Gemini Build Day on May 30.
AGI House exists because artificial general intelligence is the most consequential technology of our generation, and the people building it should be in rooms with each other often. The Intelligence Report translates what happens in those rooms into a recurring publication for builders, operators, and institutions. We publish biweekly, AI-natively, with names on the bylines.
Contributors
Reported by the AGI House team and community speakers. Analytics and insight by the AGI House Platform. Produced by Katherina Nguyen.
§ 01 · The Foyer
The Foyer.
Two perspectives on the theme from the editors, and the history underneath it.
RY
Rocky Yu
Editor · CEO & Founder, AGI House
May 25, 2026
The merging of modalities is a story about teams, not models.
When Nicole and Naina from the Nanobanana team came on the AGI House podcast, the line that stayed with me wasn't about diffusion or training data. It was an aside about org structure: the Imagen team and the Gemini team had been folded into one, and the model came out of the merged team. The merged team came before the merged model. I've now heard a version of that sentence in three different rooms at AGI House — from frontier-lab researchers, from a Veo PM, from one of our portfolio founders shipping at the seams between image and video. The pattern is too consistent to ignore.
So here's the thing I'd bet on. The companies that ship coherent multimodal products over the next two years won't be the ones with the best image model, or the best video model, or the deepest audio bench. They'll be the ones whose research and product orgs were structured to converge first. Everyone else will keep shipping incoherent surfaces glued onto strong components, and they'll wonder why the experience never feels like one thing. The Build Day on May 30 is, partly, a wager on this — we're putting the people who work at the seams in one room and seeing what they ship in eight hours. If I'm right about the team thesis, the winning demos will look like the teams that built them.
— RY
KN
Katherina Huong Nguyen
Editor · Frontier Tech Strategist
May 25, 2026
When modalities merge, so do the legal and labor categories built around them.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident.
Sunt in culpa qui officia deserunt mollit anim id est laborum. Curabitur pretium tincidunt lacus, nulla gravida orci a odio. Nullam varius, turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis sollicitudin mauris. Integer in mauris eu nibh euismod gravida. Duis ac tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor.
Ullamcorper est augue ac eros volutpat efficitur. Nam consequat ipsum a velit volutpat, in volutpat magna euismod. Proin vehicula mauris vitae purus vulputate, nec gravida purus accumsan. Mauris in arcu nec ligula tempor placerat. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae. Aliquam erat volutpat. Cras malesuada nec velit eget hendrerit.
— KN
A short history of the modality boundary
The technical separation of modalities is older than computing. Text and image had different printing presses. Image and film had different studios. Film and television had different broadcast regulators. Each separation produced its own infrastructure (publishing, photography, cinema, broadcasting), its own labor force (writers, photographers, directors, producers), and its own legal regime. When digital media arrived in the 1990s, each modality kept its inherited category — JPEG was an image, MP3 was audio, MP4 was video, HTML was text, even though all four were now the same underlying bits. The merging that Gemini, GPT-4o, Claude, and the new multimodal frontier models are doing is the first time the modality boundary has dissolved at the production layer, not just the delivery layer. The institutional consequences are still being worked out.
Five threads worth following.
Signposts
01
Image-to-Everything Models
The Nano Banana team's explicit thesis: modalities lift each other up as the base model scales.
Fig. 01 · Frontier Multimodal Capability Map · 2024 → 2026AGI House Research · normalized capability index
Four modalities, four trajectories — and the merge point.
Public benchmark scores normalized to a 0–100 capability index, plotted across eight quarters. Text plateaus first; image and audio climb steadily; video makes the largest gain and pulls the others into a shared envelope by Q2 2026.
TextImageAudioVideoSource: AGI House Research · normalized from MMLU, GPQA, MMMU, AudioBench, VideoBench-2026
Field Notes · On the Record
From founders and frontier-lab researchers.
YT
Yang Tang
Founder & CEO · Opus Clip
It's a combination of science and art. A combination of rationality and feeling. Everyone in our company watches a lot of videos. The model curation, the editing, the polish — all of it ingests human taste.
MB
Mustafa Bhuiyan
Founder · Nomadic ML
Visual reasoning systems have to cater to your stack, not to general, not well-fitting solutions out there. Synthetic data doesn't survive contact with the long tail of physical environments.
KV
Karan Vaidya
Cofounder · Composio
Integrations are going to probably become a commodity a year or two years down the line. The differentiation comes in through the data loop — knowing which three parameters out of ten thousand actually get used.
DC
David Chen
GM, Robotics · LiveKit
Robot models will follow the LLM trajectory — growing too large for the edge, moving to cloud inference, requiring a realtime streaming layer between the robot and the model.
Introductions: research@agihouse.org
Portrait Gallery
Frontier figures.
Two from the labs shaping how multimodal models reach the world.
NN
Nicole & Naina · 2025
Nicole & Naina
Product team, Nanobanana — the image model that helped Gemini overtake ChatGPT as the #1 free app.
AtGoogle DeepMind
RecordedAGI House Podcast · Sep 2025
The merged team came before the merged model. Capabilities lift each other up as the base scales.
A working playback of Kat's interview with Josh Payne — the AI-native interview format the AGI House podcast has been refining. Listen, jump between moments, pull up a chair and ask the room.
By the Numbers
Portfolio spotlight, 5/16 retro, 5/30 preview.
A look at the portfolio shaping the issue's theme, the numbers from the room on May 16, and what's on the horizon.
AGI Ventures · Portfolio Spotlight
Five portfolio companies, framed against the Gemini-class merge thesis — auto-playing broadcast, click any segment to focus.
Friday, May 30, 2026 · AGI House, Hillsborough · In collaboration with Google DeepMind
Five live metrics from the room.
Teams Building
~40 expected
Modality Coverage
Image · Video · Audio · Text
Demos Using Veo
Target: 50%+
Demos Using Nano Banana
Target: 60%+
International Teams
Target: 20%+
§ 03 · The Dinner Table
The Dinner Table.
A column from the host's chair at the AGI House Dinner Series.
Image Placeholder
Hillsborough · April 27, 2026
AGI House · Dinner Series
Course of Convictions.
Monday·April 27, 2026·7:30 PM·Hillsborough
RY
Tonight's Host
Rocky Yu
Setting a deceptively simple question — what actually gets us to AGI? — for a table of frontier researchers, founders, and investors.
At the table: Oriol Vinyals (VP Research, Google DeepMind; Gemini co-architect) · Andrew Dai (CEO, Elorean AI; 12-year DeepMind veteran) · Jiajun Wu (Stanford professor, vision & robotics) · Fan-yun Sun (Cofounder, Moonlight AI) · Zayd Enam (CEO, EnamCo; Cofounder, Cresta) · Nick Oupurov (CEO, Fleet AI) · Nazneen Rajani (CEO, Collinear; ex-Hugging Face) · Xiang Deng (Cofounder, NeoCognition) · Alex Wang (Stanford PhD) · Brian Zhan (Partner, Striker Venture Partners) · Bill Sun (CEO, GAlpha; early Google Brain attention researcher) · Andrew Ma (Director, Turing) · Rocky Yu (Host).
There is no single path forward. But there are clear fault lines shaping the future. This evening's question — what actually gets us to AGI? — surfaced a set of disagreements worth keeping, on world models, on gaming versus robotics, on systems versus weights, and on the compute bottleneck that quietly dominates everything else.
Invite Only
AGI House Dinners are exclusive events. Attendance is by invitation of the host. To request an introduction, write to research@agihouse.org.
The dinner unfolded across four courses — appetizer, main, dessert, digestif —
whose disagreements appear in their counterfactual run below.
A Counterfactual Dinner
Run it back, in your seat.
Same room, same guests, new script. Sit at the head of the table, listen to the courses unfold, and pull up a chair when you want to push back. Generated live.
§ 04 · The Reading Room
The Reading Room.
Memos and papers worth the time.
The Reading RoomA room of memos, papers, and field-notes for the merging frontier.Move your cursor · the field follows
From the AGI House Blog
Three posts most relevant to the merging-of-modalities theme.
AGI House Research · Multimodality
Native Multimodal Architectures: Why Cross-Modal Fusion Defines the Next Defensible Moat
Jessica Chen · AGI House Blog
A structural look at where fusion happens — encoder, latent, decoder — and which architectures defend value as the modality boundary collapses. The most direct companion piece to this issue's central thesis.
AGI House Research · Multimodality
Google Gemini 3: Launch Notes & Project Guide
Jessica Chen · AGI House Blog
The capability surface of Gemini 3, what the multimodal API now supports, and the project ideas that fit best — ahead of the May 30 DeepMind & Gemini Build Day.
AGI House Research · Dinner Series
World Models, Agents, and the Path to AGI
Rocky Yu · AGI House Blog
The full recap from the April 27 dinner referenced throughout §03. World models vs. scaled end-to-end, gaming vs. robotics, system vs. weights — and where AGI actually lands.
Suggested White Papers
From the academic and enterprise community.
ANTH · 2026.03
Anthropic · Economic Index
Anthropic Economic Index Report: Learning Curves & Diversification
Anthropic Research · March 2026
Empirical study of Claude.ai and API usage patterns, documenting consumer-side diversification, API-side concentration, and uneven geographic adoption.
§ 05 · The Great Hall · Science, Technology & Society
The Merging of Modalities Is an Institutional Event in Disguise.
An STS reading of multimodal AI, against the institutional history of media. What the technology actually breaks, who has to rebuild what, and where the value will land.
By Katherina Huong NguyenReading time · 12 minBroadcast · 16 min
Move & the modalities converge with you
Every previous expansion of the media stack arrived with the institutions it needed already built. The printing press inherited copyright and publishing. Photography inherited likeness rights and the studio. Cinema inherited unions, guilds, and broadcast regulators. The multimodal frontier is the first time the production of media has expanded without the institutions to absorb it — and that absence, not the capability itself, is the story this issue is tracking.
I. What the merging actually does.
The technical story is simple to describe and easy to under-read. A frontier multimodal model is a single set of weights that ingests text, image, audio, and video, and emits any of the four. That is the surface change. The deeper change is that, for the first time, the production of media is no longer separated from its understanding. A model that can read a frame can write one. A model that can listen to a take can produce one. The pipeline collapses, and so does the rationale for having separate tools, separate teams, and — over a longer horizon — separate institutions to govern them.
The merge also breaks a working assumption that has been quietly load-bearing for two centuries: that each modality is its own object of regulation, its own labor market, and its own asset class. JPEGs were governed differently from MP3s not because the bits were different, but because the institutions producing them were. When the same model produces both, the institutions producing them are the same. That is the merger this issue is actually tracking.
The more you're building these general purpose models, they benefit everyone. Nano Banana benefits a lot from our image understanding capabilities, our video understanding capabilities.
— Nicole, Google DeepMind Nanobanana team
The lift effect Nicole describes is not a marketing line. It is the empirical claim that capability in one modality raises capability in the others once the base model is shared. If true at scale, it inverts the procurement logic of the last cycle: instead of picking a best-in-class vendor per modality, the dominant strategy becomes consolidating on whichever vendor's base model is improving fastest across all four. The lab-to-product transmission belt narrows. So does the moat.
II. The institutional history this merger is colliding with.
Each modality acquired its institutional infrastructure at a different historical moment, in response to a different production technology, and around a different labor structure. Reading the merger against that backdrop is what tells you which institutions will absorb it and which will simply break.
Fig. II · Institutional regimes per modality · 1450 – 2026
Each modality inherited its own institutional infrastructure. The merged model has none.
Source: AGI House Research synthesis.
Three observations follow from reading the diagram literally. First, every regime in the chart is older than the digital-era institutions written on top of it — publishing and copyright are five and a half centuries old, photography and likeness rights almost two, broadcast roughly one. Second, every regime is structured around a labor force, not a product: writers, photographers, musicians, performers. Third, none of them anticipated a single producer that traverses all four.
That third observation is where the institutional vacuum lives. The merged-output row at the bottom of the chart has no inherited regime to absorb it, because no prior production technology produced anything that crossed those boundaries in a single artifact. The closest analog is the silent-to-talkie transition in cinema, which collapsed image and audio production into a single shoot — and which took a decade of labor and copyright realignment to settle. Multimodal compresses an analogous reshuffling across four lanes at once.
III. Where value accrues during an institutional vacuum.
Institutional vacuums are not neutral. Whoever sets the de facto rules during the gap usually keeps them. Three layers are doing the rule-setting right now, and the order in which they congeal is the most predictable thing about the next eighteen months.
Platform-level policy is being set by the frontier labs and their distribution partners. Watermarking standards, training-data opt-outs, prompt-disclosure conventions, age-gating — these are being decided by Google, Anthropic, OpenAI, and the cloud platforms that distribute them, in dialogue with a small number of large customers. By the time the EU AI Act's implementing rules and the US Copyright Office's guidance arrive, the practical defaults will already be in place. Regulation will codify what the labs negotiated.
Litigation infrastructure is consolidating. The performer-rights settlement at landmark scale this quarter (see Landscaping) is the early signal: a small number of plaintiffs' firms are positioning themselves as the de facto enforcement layer for likeness, voice, and now multimodal-output cases. Their settlements set the licensing structure other producers will adopt, not the statutes that follow.
It's a combination of science and art. Combination of rationality and feeling. Everyone in our company watches a lot of videos.
— Yang Tang, Opus Clip
The taste-and-curation layer is the most under-discussed and most economically interesting. Once any team can produce technically competent output across four modalities, the binding constraint stops being capability and starts being judgment. Yang Tang's observation about Opus Clip is the company-scale version of it. The same dynamic governs which Veo-rendered shot makes it into a feature, which Nano-Banana output ships into a product surface, which voice clone reads which line. That layer is human, low-throughput, and very hard to commoditize. Its economic value is rising in inverse proportion to the falling cost of the underlying model output.
Great Hall · Broadcast Edition · Visual Brief16 min · Video + Audio available · 5 chapters
I. What the merging actually does
Same weights. Four modalities. One pipeline.
For the first time, the production of media is no longer separated from its understanding. A model that can read a frame can write one. A model that can listen to a take can produce one.
“The more you’re building these general purpose models, they benefit everyone.”
— Nicole, Google DeepMind
II. The institutional history
Each modality inherited its own institutional infrastructure.
Five centuries of separate regimes — copyright, photography, broadcast, cinema — built around separate production technologies. The merged model has none.
1450TextCopyright & Publishing
1839ImageLikeness & Licensing
1877AudioBroadcast Rights
1895VideoGuilds & Regulators
2024Merged— no regime —
III. Where value accrues
Whoever sets the de facto rules during the gap usually keeps them.
Three layers are doing the rule-setting right now. The order in which they congeal is the most predictable thing about the next eighteen months.
01Platform PolicyLabs & distribution partners
02Litigation InfrastructurePlaintiffs’ firms
03Taste & CurationUnder-discussed, durable
IV. Where specialization survives
A general model lifts the floor. It does not raise the ceiling on judgment.
Specialization survives where human production capital — domain expertise, taste, on-set craft — does not transfer to the new medium for free.
The next 18 months will be decided by who sets the defaults during the gap — and on that, the smart bet is on whoever is shipping product at the seams.
02First coherent regime lands in California, not federal or EU.
03Labor disputes shift from strikes to opt-in licensing pools.
04Labs publicly restructure around multimodal product orgs.
01 / 05
Audio Preview · Podcast Edition
Audio Embed · Spotify · Apple Podcasts · AGI House Podcast
16:00
IV. Where specialization survives the merger.
Not everything merges. The lesson from the cinema-to-television transition is that specialization survives where the human production capital — domain expertise, taste, on-set craft — does not transfer to the new medium for free. A general-purpose model lifts the floor; it does not raise the ceiling on judgment. Three places where this is already visible.
Vertical reasoning systems. Mustafa Bhuiyan's observation from §02 — that visual reasoning has to be conditioned on the deployment stack, not bought off the shelf — is the enterprise version of this. Long-tail physical environments, regulated industries, and sovereign-data domains do not absorb a frontier multimodal model cleanly. They absorb a specialized layer on top of one.
Realtime and on-device. David Chen's robotics-streaming thesis applies here too. The frontier multimodal model is moving toward cloud inference; the realtime layer between that model and the world is becoming its own specialization. Whoever owns that seam — voice, robotics, AR overlays, telepresence — owns a moat the foundation model cannot eat.
Taste, curation, brand. The Coframe interview and the Veo Hollywood pipelines are both bets that the human curation layer is where durable margin will live. As output volume goes vertical, the scarce resource is the judgment that says this one ships, that one doesn't. That is not a capability you scale by adding parameters.
V. Four predictions and the closing observation.
One. The 2026–2027 procurement pattern in enterprise will be explicitly multi-vendor and modality-agnostic. The Anthropic Economic Index's diversification finding will hold; single-vendor multimodal lock-in will not be the dominant pattern, regardless of which lab leads on benchmarks. Procurement teams should model two-to-three frontier-vendor stacks and a specialization layer on top.
Two. The first coherent legal regime to land on multimodal output will be a California statute, not a federal one and not an EU one. The combination of the existing performer-rights framework (AB 2602, AB 1836), the active disclosure bill, and the concentration of frontier labs in the state means California will set the de facto US baseline. Other states will harmonize within 18 months.
Three. By Q4 2026 the labor disputes around multimodal output will look less like SAG-AFTRA strikes and more like guild-licensing structures — voluntary opt-in pools that license likeness, voice, and motion to specific named uses, administered by a small number of intermediaries. The settlement structure of the landmark voice-clone case is the template.
Four. Within the labs, the org-structure consolidation Rocky names in The Foyer continues. The teams that ship coherent multimodal product will be teams whose research and product reporting lines already converged. The teams that ship incoherent product will be teams that did not. Expect at least one major lab to publicly restructure around this thesis in the next two issues of this report.
The closing observation is the one this issue exists to make: the merging of modalities is not, primarily, a model-quality story. It is a story about which institutions, labor structures, and legal categories survive contact with a production technology that does not respect the boundaries they were built around. The labs are not waiting for those institutions to catch up, and neither is the capital. The next eighteen months will be decided by who sets the defaults during the gap — and on that, the smart bet is on whoever is shipping product at the seams, not whoever is filing briefs about it.
— KN
§ 06 · Landscaping
Landscaping.
The economic and legal terrain around the merger.
Economics · Reading the Index
What the Anthropic Economic Index says about model-use diversification.
Issue 01 · Diversification of Model Use
Use cases are diversifying. So is the case for multiple models.
The Anthropic Economic Index (March 2026) reports that Claude.ai consumer use has become less concentrated since November 2025 — the top 10 tasks fell from 24% to 19% of conversations. The same report notes that API usage moved the opposite direction, becoming more concentrated as enterprises specialize.
19%
Share of Claude.ai conversations from top 10 tasks, Feb 2026 (down from 24% in Nov 2025)
33%
Share of API traffic from top 10 O*NET tasks, up from 28% — API usage concentrating as Claude.ai diversifies
21.6%
US share of global Claude.ai usage; India 7.2%, Brazil 3.7% — geographic concentration persists
39%
Share of Claude.ai conversations classified as directive (delegated tasks), up from 27% in late 2024
Read in combination with this issue's central thesis — that frontier multimodal models are absorbing capability across image, video, audio, and text — the diversification pattern tells a useful operational story. As single models become capable across more modalities, consumer use spreads across a longer tail of tasks rather than concentrating on the model's marquee capability. The Nanobanana team's observation that users came for the figurine trend but stayed for education is the consumer-side version of this. The same dynamic is now visible in the aggregate data.
For enterprise procurement, the implication runs in the other direction. As API usage concentrates and enterprises specialize, the case for standardizing on a single multimodal vendor weakens — the marginal task moves between vendors, modalities, and use cases too quickly to lock down. Procurement teams should expect to operate multi-vendor multimodal stacks for at least the next 18 months. The single-vendor procurement pattern that worked for cloud and SaaS does not transfer cleanly. This is consistent with the Dinner Table position that the system matters more than the model.
Three developments in the foreground, and a wider view of where policy currently sits.
Ruling · Federal · United States
Voice clone likeness case settles at landmark scale
A federal performer-rights case targeting unauthorized voice cloning of named talent has reached settlement with damages reportedly exceeding the next-highest AI likeness case by an order of magnitude. The structure of the settlement — opt-in licensing rather than blanket prohibition — is expected to shape industry norms.
EU clarifies "general purpose AI" applies to multimodal frontier
The European Commission has issued formal guidance that frontier multimodal models fall within the General Purpose AI provisions of the AI Act, triggering specific transparency, documentation, and risk-assessment requirements that the previous model-by-modality reading had left ambiguous.
Mayo–DeepMind diagnostic pilot clears Phase II under multimodal governance review
A joint multimodal diagnostic pilot — pairing imaging, dictated notes, and patient-history text inside a single inference call — has cleared its Phase II review under an FDA framework that treats merged-modality clinical models as a new evaluation category. Procurement and trial-design notes from the pilot are now circulating as a de facto template for hospital-system contracts.
Multimodal output disclosure for political and commercial use
State-level disclosure bills advancing through legislature. Expected signing decisions late summer.Source: CA Legislative tracker
Pending
California
Performer likeness in AI-generated video
Existing AB 2602 / AB 1836 framework now being tested in active litigation involving multimodal outputs.Source: CA Department of Industrial Relations
Active
United States · Federal
NO FAKES Act (voice / likeness)
Bipartisan federal proposal targeting unauthorized AI replicas. Multimodal scope clarification pending in committee.Source: Congress.gov
Pending
United States · Federal
Copyright Office guidance on AI-assisted works
Iterative guidance continues. Multimodal output co-authorship and registrability remain unresolved.Source: US Copyright Office
Signal
European Union
AI Act · GPAI provisions
Frontier multimodal models confirmed within GPAI scope. Transparency, documentation, and downstream-disclosure obligations active.Source: European Commission
Active
European Union
Performer and likeness rights · adaptation
Existing performer rights directives being reinterpreted to cover multimodal outputs. Member-state implementation varies.Source: European Parliament research service
Pending
United Kingdom
AI regulation framework
Sector-specific regulator approach maintained, with cross-cutting principles. Multimodal output not yet specifically scoped.Source: UK DSIT
Signal
Singapore
Model AI Governance Framework v2 · GenAI
Sector-agnostic principles framework updated for multimodal generative AI. Guidance non-binding but referenced in procurement.Source: IMDA
Active
China
Generative AI service regulation
Existing GenAI rules apply to multimodal outputs. Labeling and watermarking enforcement increasing.Source: CAC
Active
South Korea
AI Framework Act
Comprehensive AI act passed; implementation rules being drafted with attention to multimodal scope.Source: National Assembly of Korea
Pending
Legend: Active — currently in force or under enforcement. Pending — proposed, in legislative process, or in active drafting. Signal — early signal of position from government or regulator without enforceable action.
Interactive · California & the Modality Frontier
The California timeline, by modality.
California is moving faster than any other US jurisdiction on multimodal regulation. The chart maps every active or pending CA bill, executive action, and regulator signal from 2022 to 2026 against the four output modalities each one targets. Bubble area is proportional to the estimated CA-nexus entities in scope — labs, platforms, studios, ad networks, production tools. Hover any bubble for the detail; filter by modality.
TextImageAudioVideoBubble area · est. CA-nexus entities in scope~50~250~700
Hover or tap a bubble to read the detail. Source: CA legislative tracker, CA DIR, calmatters.org, official agency notices.
§ 07 · The Yard
The Yard.
A closing word, and the wire from the week.
Closing Word
See you on May 30.
The DeepMind & Gemini Build Day is where the merging-of-modalities thesis becomes demoable products at the seams between modalities. Issue 02 will cover what shipped, expand the regulatory-vacuum thread with reporting from the EU and Singapore, carry the next Portrait Gallery profiles, and set the next question for the Dinner Table on the perception-to-action merger.
Gemini multimodal release sets new benchmarks across image, video, and audio
Coverage of the latest Gemini family release shows the merged modality approach now leading on major multimodal evaluation benchmarks. The same model winning on image generation is winning on image understanding, video, and audio simultaneously.
Claude.ai usage diversifies; top 10 tasks fall from 24% to 19% of conversations
The latest Anthropic Economic Index reports use cases diversifying across Claude.ai as adoption matures. Geographic adoption remains uneven, with the US at 21.6% of global usage.
Singapore launches sovereign multimodal AI compute program
Singapore's IMDA announced a dedicated multimodal compute allocation program. The HK-Singapore corridor continues to set its own procurement and infrastructure agenda.