Building AI + XR apps faster with XR Blocks

Prototyping novel AI-driven XR interactions is a high-friction process, requiring low-level integration of on-device models, XR, and AI APIs. Ruofei Du, Interactive Perception & Graphics Lead at Google, presents XR Blocks (https://xrblocks.github.io), a cross-platform framework to accelerate human-centered AI + XR innovation. With the mission of ""minimizing code from idea to reality"", XR Blocks provides core abstractions and samples that empower creators to move from concept to interactive WebAI + WebXR prototypes with Gemini Canvas.

Published: Published Nov 26, 2025
Uploaded: Uploaded Jun 13, 2026
File type: YouTube
Queried: 00
Source: youtube.com

Full transcript

Showing the full transcript for this video.

AI-generated transcript with timestamped sections.

0:05-1:40

[00:05] - Greetings, friends from WebAI 2025. [00:12] Today, I'm thrilled to be here to announce XR Blogs. [00:15] a WebAI plus WebXR SDK to accelerate AI plus XR innovation. [00:23] My name is Rofeidou. [00:26] I serve as the interactive perception and graphics lead at Google XR. [00:31] You can find my profile in Google Research. [00:35] Google Scholar. [00:36] And follow me at x.com. [00:40] My research mainly lies in the intersection of three fields, [00:44] human-computer interaction [00:46] perception and graphics. [00:48] where I strive to fuse information from both the physical world and the virtual world. [00:54] and making them more interactive, accessible, and useful. [00:59] My talk today is about a brand new open source framework, XRBlocks.js. [01:06] XRBlocks is a one-year project contributed by over 20 Googlers. [01:11] and mostly part-time from Google XR Labs. [01:14] and let's watch a short video first. [01:18] We present XR Blocks, an open-sourced framework designed to accelerate human-centered AI plus XR innovation. XR Blocks offers a high-level abstraction of AI-driven XR paradigms on both desktop and Android XR platforms, [01:33] including XR realism, with features like depth-aware physics, geometry-aware occlusion, and lighting estimation.

1:40-3:13

[01:40] XR Interaction [01:42] enabling custom gestures with on-device machine learning integration. [01:46] touching and grabbing of physical objects. AI plus XR integration, allowing for the creation of XRPoet. [01:52] object understanding, and proactive conversational agents. We envision that XR blocks will help amplify prototyping efforts. [01:59] empowering XR and AI creators to unleash their inner creativity. [02:04] Thank you. [02:06] I'm sorry. [02:07] No worries. Today, we are at a time when AI and XR are really converging to unlock a new paradigm of computing. [02:17] From immersive headsets, like AndroidXR headsets, [02:20] to helpful everyday AI glasses. [02:23] like the project Astra we announced earlier this year and last year. [02:28] However, there's still a large gap between the two ecosystems of the two fields, [02:34] AI and XR, [02:36] Where AI research and development is accelerated by mature frameworks, [02:41] like JAX, TensorFlow, PyTorch, and benchmarks like Hugging Face, LIM Arena, [02:49] XR often requires practitioners, creators, and developers to manually integrate the [02:56] this parrot, [02:57] low-level systems for perception, [03:00] rendering, and interaction. [03:02] and also upgrading your Unity versions again and again and again over the years. [03:07] Six years ago, we presented the ARCore depth lab on mobile phone using Unity.

3:13-4:45

[03:13] But it is non-trivial to migrate to XR due to the fragmented nature of the Unity ecosystem, and also the different headset APIs, and the difference between mobile phone AR and headset XR. [03:27] And a lot of AR interfaces were developed on mobile phone over my past years, but it's very hard to migrate to XR and [03:35] And we really want to change that today. [03:38] Last year, we presented the Visual Blogs on WebAI Summit 2024, as an open source framework, to lower the barrier for development of machine learning, [03:47] multimedia applications with a no-code node graph editor, [03:51] You can scan the QR code and try out the system today. [03:54] And everywhere you can just drag and drop different modules of camera input, Gemini nodes, and on-device media pipe models to create a new AI pipeline. [04:05] And later we also showcased the InstructPipe research prototype, which won the honorable mention award in CHI 2025. And this empowers creators to quickly author an on-device AI pipeline by a single prompt. [04:20] For example, I say I want to quickly try out a virtual try-on with a sign-glass from Google search, and create a visual pipeline like this. [04:29] And what we found with all these projects is, [04:32] XR. [04:33] is not yet as scalable as AI on device, on the web. [04:38] It is usually fragmented with varied platforms, [04:41] programming languages. [04:43] and interaction paradigms.

4:46-6:16

[04:46] So we wonder, how can we make AI plus XR research and innovation more accessible and scalable? [04:54] And today's XR research is oftentimes a one-time thing, [04:58] and you make the prototype. [05:00] you do the user study. [05:01] But nobody really reused it again, and the wheels are always being reinvented over the past few years. [05:08] So how can we kick off the flywheel from AI to XR? [05:13] Or how can we make XR really easy and fast? To allow innovators to focus on the really cool part, [05:20] rather than the low-level integration. [05:23] The easy and fast assumption in traditional web coding is technically false in XR today, because there's basically no existing ecosystem to understand [05:33] even author a quickly 3D model and allow people to drag around and interact with it. [05:38] But our goal is to deliver a great set of tools [05:41] for XR plus AI use case on the web. [05:45] To start the journey, we tried a lot of languages and toolkits to do one simple thing. [05:50] using the pinch and click and touch gesture on mobile phone or headset on laptops to change color of a cube. [05:58] But astonishingly, the minimum coding requirement is easily over 200 lines of code. [06:05] And even with Unity, it's already very simple, but you still have to install some manual packages to adapt to MetaQuest, Apple Vision Pro, and Android XR headsets to make it work.

6:16-7:47

[06:16] And it takes more than triple the coding time than the compiling time, actually, to really deploy it on device. [06:24] And the minimum still requires too many coding time. [06:28] But here today, with XRBlocks, we strive to use the minimum code to make XR perceptive experiences really simple. [06:37] For example, here only 39 lines of code. [06:39] You can create a cube and use pinch [06:42] or touch our mobile phone to change the color. [06:45] And you start with a very simple import in JavaScript, and we work closely with 3.js, and also with Ricardo. And then you can write a simple script, [06:55] Here, it's only like even less than 30 lines, it's only like maybe 15 lines. You can create a main logic to render a cube, and when updating, it will rotate and change its direction in XR. And you can try the same code on desktop, mobile phone, and Android XR headset. [07:18] And today, you can check out all our samples at our website, xrblocks.github.io. [07:24] And we provide a variety of templates and samples [07:29] for both human and AR creators to learn from our best practices. [07:35] And inspired by the existing game engines, like Unity, we would like creators, [07:41] to really focus on the co-idea of an XLR application. [07:46] the script.

7:47-9:20

[07:47] And whenever creator wants to call user, call the world, or summon an interface, [07:53] and even an agent in the future, or build communication between the agents and the peer user, [07:59] This should be ready to go. [08:01] And note that today we are only halfway toward this roadmap, and we welcome to leverage the community contribution to complete the meeting puzzles. [08:11] For example, here's an idealized minimized syntax in XRBlocks. [08:16] and other perception, low-level details should be hidden. [08:19] And the creators should really, really focus on the code logic of the invention. [08:24] just create a poem from the external camera. [08:28] Thank you. [08:29] To start with, we chose WebXR and 3.js and Gemini as an example of building blocks for our framework. [08:36] Yet, I do believe with more contributors, we can extend our vision to native C++ with OpenXR and Unity. [08:45] And our vision is to build the set of interactive primitives [08:49] for web coding for XR. [08:51] And today, many of our WebAI Summit attendees already tried our demo, and one of the most amazing demo I see today is, like, create a Brazilian soccer player. And Gemini Converse can actually create the 3D Gemini player, and you can pinch and press [09:07] Drag it around. [09:08] Our North Star is to turn ideas into reality, like pitched in this diagram, so that [09:16] AI can really help creators to execute it.

9:21-10:52

[09:21] create at the speed of thought [09:23] to maximize human creativity. [09:26] And to achieve this vision, we implemented this set of tools using this low-level subsystems within the SDK, including AI, module, camera, depth, [09:39] lighting estimation, physics, sound, [09:41] input, agent, UX, effect, UI, and most importantly, the simulator. Because back in the days, it's very troublesome to deploy in XR. Oftentimes, you call something, only by putting the XR headset can you see how your demo works in the reality. [09:58] But here, we provide you a simulator that can simulate depth map, lighting estimation, and hand gestures. So you can see whether your thumbs-up gesture really works using lightrt.js. And the same code should naturally work in Android XR headsets. [10:15] Here's some examples. We provide you a model viewer that allows developers to quickly wrap a geometry primitive, a 3D model, and even a 3D Gaussian splatting instance with a model viewer so you can pinch and drag around in XR. [10:31] And we provide a set of spatial UI library with fine-distance function libraries to render high-quality text. [10:38] and the basic composable APIs to do [10:41] generative user interfaces. [10:44] And empowered by lightrt.js, talked about Matthew earlier, and we have close collaboration as first-party users.

10:52-12:29

[10:52] will allow creators to simulate hand gestures, thumbs up, victory sign, [10:57] to write machine learning models on device. [10:59] So no gesture data goes to the server, and you have the full privacy on Android XR headset. [11:06] You can also experience spatial audio and geometry aware weather effects and see the ring drops on your hands. [11:12] We'll show you a real-world demo later. And empowered by Gemini, you can recognize all the objects around you, and when you reach out your hands to the objects, you can ask Gemini questions. For example, where can I buy this coffee table? [11:26] And here shows the real-time demos on the Android XR headset, which is going to be released later this year. [11:33] featuring XR realism with depth sensing on the web, [11:37] You can just pinch to shoot colorful balls around your environment, and we use on-device depth sensing algorithms, and you can see the ring drops dropping on your hands. [11:48] And this is using lightart.js. You can do thumbs up to summon balloons and use victory to summon the colorful strips [11:56] And you can use a dynamic web gesture to go to the next photo when you are using a future photo tab. [12:03] And there are more AI plus XR use cases. For example, you can generate a poem like with a video see-through camera and recognize the object around you and ask Gemini, what's the calorie of the fruits? [12:17] And finally, we envision a growing set of interaction primitives [12:22] in the two cases, to unify the basic interactions. For example, the hand pinch, the mouse click, and the

12:29-14:03

[12:29] screen touch should be unified at select. And we also provide some samples so that you can grab and touch objects, and we hope to leverage the community contribution to finalize this roadmap and the interaction paradigms. [12:44] The directional details are illustrated in our archive paper. Feel free to check it out, XRBlocks on archive. [12:54] And we hope with a growing community of innovators, [12:58] we can make AI really saturated in XR, [13:01] turning ideas into reality, [13:04] and allow everyone to unleash their inner creativity. [13:09] To give you some inspirations, here are some examples from our amazing UX engineers and designers. [13:14] starting with the art gallery, [13:16] It's purely done with XR blocks and no code environment in Japanese Converse. Just prompt like create infinite gallery and by keep iterating and prompting Japanese Converse you can [13:28] Click on each art piece and [13:31] go to the next art by selecting the keywords in both laptop and Android XR headsets, [13:37] You can build a procedural city by [13:39] clicking and pinching on the virtual map. [13:42] And this bubble XR is built by me, and I just summoned the bubbles and used my hand touch to dismiss [13:48] All the bubbles. [13:49] using our XRBlocks SDK. You can check out our demonstration and we have the live working demo of these. [13:57] Finally, I would like to deeply thank all my XR blog contributors across Google over the past year.

14:04-14:08

[14:04] And thank you everyone for listening, watching and contributing. Thank you.

Want to learn more?