Web AI leaps forward on Intel AI PCs
Moh Haghighat, Intel Fellow, showcases the latest performance advancements in WebGPU and WebNN on Intel AI PCs. Tune in for a sneak peek of Panther Lake, Intel’s next-generation client processor, designed to unleash new AI experiences and expected to launch in early 2026, elevating the web into a premier platform for AI.
- Published
- Published Nov 21, 2025
- Uploaded
- Uploaded Jun 13, 2026
- File type
- YouTube
- Queried
- 00
- Source
- youtube.com
Full transcript
Showing the full transcript for this video.
AI-generated transcript with timestamped sections.
[00:04] It is a distinct privilege to be on the stage here again this year. And I'm very excited to be presenting to you [00:13] the work of Intel Web Platform Engineering Team, the team that has contributed over [00:20] 15,000 patches to the Chromium project. [00:24] the open source foundation of the Chrome browser, as well as the Edge browser. [00:29] I'm Mo Haggagat, Intel Fellow. I lead all Intel's work in the area of web technologies. At Intel, we have been proud of Moore's Law that has been driving the computing industry. Essentially, it is about doubling the size of the number of transistors and chips over like a year or 18 months. It seems Jason is establishing May's Law, which is doubling the size of WebAI audience every year, and I wish him best of luck. [00:54] So last year, when I talked here, we had already released our [01:00] flagship product Lunar Lake AI PC with three execution engines, CPU, GPU, and NPU. CPU for fast, quick, small, [01:10] AI inference, GPU for throughput large model, and NPU for use cases that require sustained power efficient execution. [01:21] Now today, I will talk about the progress that we have, the advancement that we have on WebNN, on WebGPU, as well as in W3C Web Machine Learning Working Group. [01:32] And I will give you a glimpse of our exciting next generation product, codename Panther Lake, which will be released in January. We announced it last week.
[01:46] Okay. [01:46] Basically, two major progress we have had on WebNN. [01:54] We have doubled the performance of WebNN, [01:57] on GPU, [01:58] and we have increased, majorly increased the coverage of WebNN [02:02] on NPU. WebNN is an emerging standard web API being defined at W3C. My brilliant colleague, [02:13] from Microsoft are the co-editors, and we have been very closely working with Google browser team in having that in Chrome browser and in Edge browser. Double the performance of web and execution on GPU, the only API for general inference, web API, that can run on all these execution engines, CPU, GPU, and NPU. [02:43] vendor execution provider. For us, that EP is OpenVINO, which is highly optimized for CPU, GPU, and NPU. The implementation has been simplified. We used to go to XNNPAC through TF Lite for CPU and to DirectML for GPU and NPU. Now we just go through WinML, OpenVINO Execution Provider, for all of these engines. And we get major improvement across the board. [03:13] more than doubling the performance on GPU.
[03:17] Now I'm going to show you a number of demos side by side where we were last year and where we are right now. On the left side is last year, and on the right side it is... [03:30] It is right now. And basically showing the progress in software [03:35] The same hardware. [03:38] segment anything. You see we used to run that in like 30, 34, 38 milliseconds. Now we are doing it in 13 milliseconds, 12 milliseconds. [03:48] more than double time fast. 12 milliseconds is amazing. [03:52] Next, I will show a demo of whisper-based speech recognition. And last year we had it [04:02] 43 tokens per second. [04:05] And now it is 100 token per second on GPU, and it is 98 token per second on NPU. [04:13] really, really fast, the same API. Next, I'm going to show you stable diffusion. It's like you ask the system to generate an image with certain characteristics. Last year, we were happy that we had just got below one second. [04:28] 900 millisecond or so on. Now we are doing it in 400 millisecond. [04:34] on GPU and 600 milliseconds on NPU. I have a demo by the table. You just come there, the laptop is there, and you can experiment with that. It's pretty amazing. 400 milliseconds, less than half a second.
[04:50] much, much faster than if you go to chat GPT and ask for that to be generated. Everything local, everything private. [04:57] Next, here I'm going to show you a number of demos that are possible on NPU because of the great contribution by Hugging Face and Joshua Luckner, who will be the speaker after me. He brought 20 top Whisper Transformer.js models to WebNN. And on the left side, you see basically the execution of depth anything, like depth estimation, [05:27] And at the bottom of the video, you can see Task Manager demonstrating that GPU is busy on the web and running on GPU. And on the right side, you see NPU is busy doing the work. [05:42] Here, I'm going to show you background removal, another demo of the transformer.js. I'm sitting in my-- [05:52] home office, and I toggle to see how quick it removes the background. And those are, by the way, real books. And you see on GPU and on NPU, really good performance, and happening, again, same API, being able to run on all these engines. [06:10] And next, here you see object detection. I'm showing an object, and the object is detected. Left side, again, on GPU, right on NPU. Here I'm showing it a book, and I think I should charge Eric Schmidt for advertising the book, especially when this goes on YouTube, or a bottle.
[06:40] on NPU. The same API is capable to deliver such an astounding performance. [06:47] Thank you. [06:48] We have also... [06:50] enabled GPU on Windows 10, which doesn't have Windows ML, as well as on Linux. Last year, we could only run WebNN on CPU through Xn and Pack. Now, in addition to CPU, thanks to the highly optimized kernel of ML Drift, [07:08] we can actually run that on a GPU on both older system, Windows 10, as well as on Linux. And one major optimization that is ongoing, it's in the works, it is optimizing the buffer reuse between CPU, GPU, and NPU. Already, 18% performance improvement on the communication between CPU and GPU, and 50% performance improvement on the KV cache. [07:38] GPU interoperability so that application can use these two APIs simultaneously with communication between them minimized. Like, for example, Google Meet might use certain things with Web and then certain things with Web GPU. And again, communication highly optimized. [07:56] Now, WebGPU, three major progress, enhanced memory bandwidth utilization, improved thread occupancy with SIMD, as well as enabling Intel Matrix Extension, XMX.
[08:26] the width of the SIMD operation. These we are all doing it under the system. As developers, you don't have to worry about that. As a result of that, matrix multiply, which is the primitive for AI, is improved in performance by a factor of almost 2x. [08:56] from 15 tokens per second to 20 tokens per second because of this optimization. Now we have XMX enabling in the work. We have already enabled that on the Vulkan path. And there, basically, WebGPU gets advantage of these matrix extension capabilities, engines that are in the Intel platform. [09:26] built-in AI that Parisa mentioned already can enjoy this major improvement. And we are working also with Microsoft on enabling XMX through D3, D12. That is work in progress. Hopefully, we'll get there, and you will have it on Chrome browser on Windows as well. [09:45] And here is this speedup that we are already seeing it on the Vulcan pad, like 1.8. [09:52] X speedup. This is like amazing speedup on web GPU through these lower level optimizations.
[09:59] Now, [10:01] It is a privilege to be giving you a glimpse of our major next client platform, codename [10:09] Panther Lake. It is the first client platform on Intel 18A technology. It offers a [10:17] 50% faster CPU, a 50% faster GPU, and an enhanced power efficient NPU. It delivers 100 [10:27] eighty [10:28] tera operation per second. And if you recall, lunar lake, it had up to 120 [10:36] third operation per second. It's major improvement across CPU, GPU, major efficiency improvement [10:44] on NPU. [10:46] Panther Lake introduces XE3 GPU for scaled performance without compromising power efficiency. [10:57] Scalability is the name of the game here. XE3 supports up to 12 XE cores. Different configurations can have different number of XE cores. Each XE core has [11:12] Eight XMX engines. XC3 XMX engine is the primary AI acceleration, integrated AI acceleration engine. It supports [11:24] up to like 1024 32-bit operation, or 2048 16-bit operation, or 4096 8-bit operation. Per clock cycle, it is mind-boggling. In integrated GPU,
[11:42] Each core... [11:44] running basically 4096 8-bit operation. This is like supercomputer on an integrated GPU. [11:54] Panther Lake also enhances its AI capabilities by a newly designed NPU-5, delivering high performance and small footprint, optimized for both area and power efficiency. Compared to Lunar Lake, [12:12] it has about 40% better performance per area, means 40% more power efficient. [12:20] It is expected to be released in January. [12:24] Web AI is going to shine [12:27] on Panther Lake. [12:29] Thank you. [12:30] Thank you. [12:31] We also have had major progress at W3C web [12:37] ML working group that is chaired by my colleague Ansi Kastainen. We have had [12:45] 30% growth, [12:47] in companies and organizations participating in WebML and WebNN working group. And new community groups have been developed for agentic experiences, WebMCP there and built-in AI. About 200 participants we have there. Newcomers to the working group, official working group, are like Hugging Face
[13:14] Joshua Lochner, the next speaker, personally represents Hugging Face, and I especially thank him for his major contribution to WebNN, as well as Qualcomm, as well as ARM, Will Lord Will Talk before noon today, and NVIDIA and others. So major progress in WebNN. [13:35] a coalition in defining this ubiquitous APIs that will run on all browsers on all harder platforms. Very, very pleased with that. [13:46] . [13:48] WebNN is available on Windows ML general availability right now. You can download and enjoy all these experiences. Join us in shaping the future of WebNN. Very excited about that. And here I have additional link. [14:18] listened to me. Thank you so much.
Want to learn more?