NVIDIA T4 with Ian Buck and Kari Briski

Today on the podcast, we speak with Ian Buck and Kari Briski of NVIDIA about new updates and achievements in deep learning. Ian begins by telling hosts Jon and Mark about his first project at NVIDIA, CUDA, and how it has helped expand and pave the way for future projects in super computing, AI, and gaming. CUDA is used extensively in computer vision, speech and audio applications, and machine comprehension, Kari elaborates.

NVIDIA recently announced their new Tensor Cores, which maximize their GPUs and make it easier for users to achieve peak performance. Working with the Tensor Cores, TensorFlow AMP is an acceleration into the TensorFlow Framework. It automatically makes the right choices for neural networks and maximizes performance, while still maintaining accuracy, with only a two line change in Tensor Flow script.

Just last year, NVIDIA announced their T4 GPU with Google Cloud Platform. This product is designed for inferences, the other side of AI. Because AI is becoming so advanced, complicated, and fast, the GPUs on the inference side have to be able to handle the workload and produce inferences just as quickly. T4 and Google Cloud accomplish this together. Along with T4, NVIDIA has introduced TensorRT, a software framework for AI inference that’s integrated into TensorFlow.

Ian Buck

Ian Buck is general manager and vice president of Accelerated Computing at NVIDIA. He is responsible for the company’s worldwide datacenter business, including server GPUs and the enabling NVIDIA computing software for AI and HPC used by millions of developers, researchers and scientists. Buck joined NVIDIA in 2004 after completing his PhD in computer science from Stanford University, where he was development lead for Brook, the forerunner to generalized computing on GPUs. He is also the creator of CUDA, which has become the world’s leading platform for accelerated parallel computing. Buck has testified before the U.S. Congress on artificial intelligence and has advised the White House on the topic. Buck also received a BSE degree in computer science from Princeton University.

Kari Briski

Kari Briski is a Senior Director of Accelerated Computing Software Product Management at NVIDIA. Her talents and interests include Deep Learning, Accelerated Computing, Design Thinking, and supporting women in technology. Kari is also a huge Steelers fan.

Cool things of the week
  • Kubernetes 1.14: Production-level support for Windows Nodes, Kubectl Updates, Persistent Local Volumes GA blog
  • Stadia blog
  • How Google Cloud helped Multiplay power a record-breaking Apex Legends launch blog
  • Massive Entertainment hosts Tom Clancy’s The Division 2 on Google Cloud Platform blog
  • NVIDIA site
  • NVIDIA Catalog site
  • CUDA site
  • Tensor Cores site
  • TensorFlow sote
  • Automatic Mixed Precision for Deep Learning site
  • Automatic Mixed Precision for NVIDIA Tensor Core Architecture in TensorFlow blog
  • TensorFlow 2.0 on NVIDIA GPU video
  • NVIDIA Volta site
  • NVIDIA T4 site
  • WaveNet blog
  • BERT blog
  • Compute Engine site
  • T4 on GCP site
  • Webinar On Demand: Accelerate Your AI Models with Automatic Mixed-Precision Training in PyTorch site
  • PyTorch site
  • NVIDIA TensorRT site
  • TensorRT 5.1 site
  • Kubernetes site
  • Rapids site
  • NVIDIA GTC site
  • Deep Learning Institute site
  • KubeFlow Pipeline Docs site
  • KubeFlow Pipelines on GitHub site
  • NVIDIA RTX site
Question of the week

Where can we learn more about Stadia?

Where can you find us next?

Mark will be at Cloud NEXT, ECGC, and IO.

Jon may be going to Unite Shanghai and will definitely be at Cloud NEXT, ECGC, and IO.

NVIDIA will be at Cloud NEXT and KubeCon, as well as International Conference on Machine Learning, The International Conference on Learning Representations, and CVPR

[MUSIC PLAYING] MARK: Hi, and welcome to episode number 168 of the weekly Google Cloud Platform Podcast. I'm Mark Mandel, and this week, I'm here with my colleague, Jon Foust. Jon, how are you doing?

JON: I'm doing great. How about you?

MARK: Doing all right. GDC is over, so I'm very happy.

JON: I actually kind of miss it.

MARK: You miss it? It was a lot.

JON: Yeah. It was my first one, so I had a lot of fun.

MARK: It was pretty awesome. It was pretty awesome. So yeah, this week we have Ian Buck and Kari Briski joining us from NVIDIA to talk to us all about their new NVIDIA T4s and what you can do with them on Google Cloud.

JON: Sounds awesome. And we will also get into our question of the week, specially following the new announcement for Stadia. So we'll give you a little bit of more information on that.

MARK: Yeah, it will be really good. But before you do that, why don't we get stuck into our cool things of the week? We do have a lot of gaming content, but I do want to do what I do pretty much always, which is mention Kubernetes. So congratulations to the Kubernetes team. They recently released version 1.14, which has some pretty amazing stuff in it.

The thing that I'm particularly excited about is production level support for Windows nodes. I've been waiting for that for a really long time, so I'm very excited about that, as well as customized integration with kubectl and all sorts of other good things, as well. So wait for that to come to your favorite cloud provider near you shortly.

JON: I can't wait. I'm a big Windows user. Following that, of course we're going to talk about the Stadia release.

MARK: Oh yeah, that little thing.

JON: Stadia's that amazing new thing. So Stadia is a new video game platform delivering instant access to your favorite games on any type of screen. That means TV via Chromecast, on your laptop through Chrome, your desktop through Chrome, tablet, or mobile phone. And we will be launching later this year in select countries. And those countries include the US, Canada, UK, and much of Europe. I don't know about you, but I actually got my hands on playing Stadia at GDC, and it was amazing playing "Assassin's Creed Odyssey."

MARK: Yeah, I love this-- just the fact that I don't need a console, I don't need a PC. I can stream the games straight to my computer. All I'm waiting for is this-- I just want this experience where I'm sitting in front my computer and I'm like, hm, I think I guess I'm going to play "Assassin's Creed Odyssey" of rather than work on this spreadsheet. New tab-- go into Stadia. Now I'm playing the game. I'm done.

JON: Yeah, it's amazing.

MARK: Fantastic.

JON: And it also removes the need for those large gaming rigs that everybody likes to build. I have two, personally.

MARK: Oh yeah, you have two. I didn't know that.

JON: I have two. Yeah, so imagine just sitting in the airport and being able to play your favorite triple-A titles that potentially come to Stadia. It's kind of amazing.

MARK: Yeah. I mean, I do that now in airports, but I lug around a 17-inch gaming laptop that weighs a ton. I don't want to do that. Awesome, so moving on.

Yeah, we have a bunch of gaming stuff here. There was a blog post that came out recently talking about how Google Cloud helped Multiplay power "Apex Legends" and their massive, massive, massive launch. So if anyone's familiar with the free-to-play shooter, "Apex Legends," it broke a whole bunch of records, which is fantastic. During the first eight hours of its debut, it reached up to 1 million unique players, which is pretty amazing.

JON: That is.

MARK: Within the first 72 hours after its initial launch, "Apex Legends" reached 10 million players, and has now reached 50 million unique players after just one month. Oh my god.

JON: Wow.

MARK: So this was facilitated by Unity's Multiplay team. They helped the "Apex Legends" team basically to distribute and scale and grow this. They have a particularly unique hybrid scaling technology to scale the majority of the demand for "Apex Legends" with Google Cloud, but also use its network of bare metal data centers, as well. Yeah, they run up a lot-- a lot-- of Google Compute Engine instances to make this happen. It's pretty amazing. So much congratulations to "Apex Legends," to the team at Multiplay. They have done an excellent job, and we're very happy having them on Google Cloud.

JON: Yeah, I've actually played "Apex Legends" just a bit, and it's actually pretty fun. So it's amazing learning where the team has gone with that. Following that, we are announcing that Massive Entertainment has selected Google Cloud as the public cloud provider to host the game servers globally for the highly anticipated sequel for "The Division," "The Division 2." The reason we've been chosen-- primarily because Google Cloud's secure, high-speed fiber network allows for consistent high-performance experiences across regions. So that supports game data and other core services, such as matchmaking, high scores, stats, inventory, and of course, seamless gameplay.

MARK: Nice. So yeah, lots of good game stuff, unsurprisingly, coming out of GDC. Why don't we go have a chat with Ian and Kari and hear all about NVIDIA's T4s?

JON: I'm very excited to talk to both Ian and Kari.

MARK: So super excited to be in the very fancy offices of NVIDIA today.

JON: Super impressive-- I actually like the color scheme a lot. Dark colors are definitely my thing.

MARK: Green and black-- very nice. We are joined today by Ian Buck, GM and vice president of Accelerated Computing, and Kari Briski, senior director of Accelerated Computing software product management.

KARI: It's a mouthful.

MARK: How are you both doing today?

IAN: Great, thank you.

MARK: Excellent.

IAN: Welcome.

MARK: Yeah, thank you. Thank you-- first of all, thank you for having us. Second of all, thank you for joining us here on the podcast, as well. Before we get into all the fun and cool new NVIDIA things, do you want to tell us a little bit about yourselves before we get stuck in? Ian, you want to go first?

IAN: Yeah, sure. I'm Ian. I've been at NVIDIA for-- oh, it was 18 years now. I actually started very early on, finishing my PhD at Stanford and using GPUs for general purpose computing, and came to NVIDIA to actually start CUDA, which was our first-- the software programming platform for how people can use GPUs not just for playing video games, but for actually doing more general purpose computing. It's been an exciting ride. And today, I have a very interesting job of driving all of our GPUs into data centers around the world.

MARK: Yes.

JON: I'm really big on the gaming stuff, so I'm real excited about that-- but really interested in learning about the computing.

KARI: Hi, I'm Kari Briski, and I've been at NVIDIA for about 2 and 1/2 years. But I've been a leader in product management for about 15 years-- so really trying to focus on that end user experience and using our software.

MARK: Awesome. And you mentioned CUDA. That's a thing. Do you want to tell us a bit of the history of that, or just a broad overview of what that is and what that does, and all those things? We have half an hour. It's fine.

IAN: Yeah, sure. It's a kind of a fun story. Back in around 2000, obviously, the video game industry was driving a ton of investment in PCs and graphics accelerators for playing video games. As a result, those video games were getting more and more realistic and more and more programmable. A bunch of academics, myself included, realized that maybe these processors could be used for something other than playing "Quake" and "Doom" in the evening--

GAME: I'm pinned down!

--which of course-- we all were doing that, too.

MARK: Which is fine.

KARI: Not that there's anything wrong with that.

IAN: So we started looking at these [INAUDIBLE]. And it turns out they were doubling in performance, like how many floating point operations they can do, literally every year, and repeatedly year over year. And they were pretty hard to program from a general purpose perspective using graphics APIs like [INAUDIBLE].

So I started a project early on at Stanford called Brook, which was to explore-- how can we program these things in more general purpose way? In a way that exploited their natural ability to double in performance, to defy Moore's law, but be usable by a general programming audience in C and other languages-- that was very successful project. Open sourced it, got it out there, got over 10,000 users-- and then Jensen, our CEO, invited me to come to NVIDIA and do it for real.

So we started CUDA back in 2004, learned all those lessons of how to do parallel programming for the masses on a different kind of accelerator, a massively parallel one, a GPU, and took it off from there. It launched in 2006, and it instantly became very successful in high performance computing and supercomputing, and today, in AI.

JON: Awesome. On behalf of many millions of gamers around the world, thank you.

IAN: You're welcome.

JON: And I'm really excited to see the big shift towards general computing and processing, so-- real excited.

IAN: What makes it so exciting is that it's so accessible. Anyone can go to Fry's, by a laptop with a GPU, and get access to the platform, download it for free, and even on a gaming PC, try it out and figure out what they can do. We hear all about these stories of PhD students, or graduate students, even undergrads blowing away their peers or professors or classmates on the stuff they can do with GPUs that was never even imaginable when we started this whole thing.

MARK: Yeah Kari, I think probably you have a good perspective on-- where do you see CUDA and the things that people are doing with GPUs-- what are the applications people are doing with that right now?

KARI: Yeah, like Ian said, there's a variety of applications with our focus on AI. What's kind of exciting and happening is-- the easy thing to say is everywhere. But if I were to sort of break down AI applications and what our customers are doing, I bucket them into three groups, and that's computer vision, speech and audio, and then machine comprehension NLP. And so you can even break those down into deeper, lower buckets of different types of neural networks, which we call the Cambrian explosion, because almost every task under these buckets can be applied with a neural network.

IAN: It's awesome to see things evolve over time. It all started with that first AlexNet-- Alex Krizhevsky, a guy up in Canada-- he was working on this thing, which had been totally forgotten about, deep neural networks, as a researcher working with Hinton, and heard about this thing called CUDA, heard about this stuff, math that we were doing with HPC, and realized that it was the same math that he was doing in his research in AI, which-- that back then was still the AI winter.

He wrote the first accelerated framework for AI and invented AlexNet. That's why it's called AlexNet. It's named after him. And he did it on a bunch of gaming GPUs, two GTX 570s, way back in 2012. It all took off from there. Today, I think we have a massive collection of different kinds of neural networks, different applications of AI. And that's what makes it so fun. This whole field is constantly reinventing itself.

KARI: Which I think is funny-- he said, way back in 2012.

MARK: It's not that long ago, right?

KARI: It's not that long ago, and it's actually-- AlexNet is considered a toy now, considering the new neural networks, which are really deep and very complex.

JON: I Actually still have my GTX 680, I think.

IAN: Oh, yeah.

JON: It's somewhere collecting dust. I'll probably end up using AlexNet and see if I can mess around and do some things.

IAN: Yeah, we've made things a little faster since then.

MARK: Just a little.

IAN: Those first GPUs, again, were designed for HPC computing use cases and of course, video games. But since then, we've been constantly reinventing our GPU. In fact, we pump through and create new architectures, new software stacks, literally every year, because that's how fast things are changing in this world.

MARK: Speaking of new things, I know you'll introduce something new recently, maybe something cool.

JON: Super awesome, yeah.

IAN: We just came out of our GDC conference, which is our main developer conference, which we invite the world to come. And we share some of the things that we are working on. They share the stuff that they've been doing our platform, which is really exciting. We've been improving our AI performance for sure.

There's two parts of AI. There's the training part, where we're teaching neural networks to learn new things. One of the things that we've been doing is actually making it faster and easier to get the peak performance out of our GPUs. And particularly in frameworks like TensorFlow, courtesy if you guys at Google, which we work very closely with, getting access to the performance that are available in our latest GPU cores, called Tensor Core, which is a mixed-precision core that can do things in 64-bit floating point, 32-bit floating point, 16-bit, even 8-bit, because AI can take advantage of all those different precisions and the different layers of performance.

The new thing we announced was TensorFlow AMP, which is our NVIDIA AMP acceleration into the TensorFlow framework to take advantage of that mixed precision automatically for users. It will automatically pick the right precision for the right layers of the neural network and maximize performance. We're seeing huge speedups. This is, like, 2-- 3x, even for AI training. And it's literally a one to two-line change in your TensorFlow script.

KARI: The goal was to give like a great experience out of the box. So the deep learning practitioner doesn't have to change the hardware. They don't have to change their hyperparameters working in the framework that you feel most comfortable with, which for a lot of people is TensorFlow. We also have some solutions for [INAUDIBLE], as well. But AMP stands for automatic mixed precision, which is that training technique for being able to feature input in FP16 and accumulate in FP32, which turns out is really important for keeping the accuracy of your gradients through training.

MARK: You said a bunch of words there. Some of them, I might even know what they are. Can you break that down a little further?

KARI: Yeah, I think if I have three takeaways that I want you to take away-- AMP, automatic mixed precision, is that you only have to change two lines of code, and things just go way faster. So if you're a deep learning practitioner, you have training on neural networks. And maybe it takes, like, a day to train. You can now do it in half a day, because then you can run double the experiments right out of the box without doing anything different. So that's kind of cool.

And then there is no accuracy versus speed trade-off for mixed precision training. I want to be pretty clear about that, because you say lower precision. And yeah, you're training in less bits-- FP16 instead of FP32. But you're not losing accuracy. And so I want to be very clear about that and have people understand that. And that's the third takeaway, is that automatic mixed precision training is general purpose across a variety of different applications.

IAN: Yeah, there's a lot of talk about precision and using all sorts of different kind of numerical formats and math. It's really hard to get the peak performance and not do any compromises. We've made that easier by putting all that smarts, all that intelligence, all those numerics and complex equations and loss functions-- more words-- into a box for you.

And that was one of the big engagements in talking with our customers, is that they want to get the best performance, but they're not the ninja data scientist, right? They just want to get the best performance out of the GPUs that they're buying. And we've made it a ton easier with AMP, and we're excited about it.

JON: So GCP and NVIDIA-- what kind of hardware are we launching together?

IAN: GCP has been a great partner. You guys always want to put the best GPUs in the cloud for your customers.

MARK: We try.

IAN: And you have since the days of Kepler, which has been fantastic. We now have our Volta V100 GPUs in the cloud with TensorCore. AMP is specifically designed to run those lightning fast. And then just last year, we also announced our T4 GPUs with GCP.

This is a little bit different kind of GPU. It's designed for inference, the other side of AI. One is training, teaching a neural network how to recognize things, speech, the words I'm saying, what's inside an image. That's the learning part.

Then there's the deployment part, the scale-out, the running of the services, the deployment of AI. In the past, that used to run on whatever CPUs you had, and typically there was a compromise. You picked a small neural network, or maybe you couldn't do everything. The new AI that's coming out now are incredibly intelligent, very powerful, impressive--

KARI: Complex, yeah.

IAN: They're also very big-- lots of floating point operations. And this has to be realtime. It's got to recognize my speech as fast as I'm talking right now, which is probably too fast.

MARK: You're fine. You're good.

IAN: That's why people are now moving inference to acceleration. So you guys were the first, actually, to launch T4 in the cloud. Congratulations. And we've been working together to make inference even faster jointly with customers.

KARI: Yeah, and just for the audience out there, inference is taking that train model and using it for predictions. I think a lot people use the term predictions, and we use the term inference. Yeah.

MARK: What kind of speed increases are we talking about here?

IAN: Orders of magnitude. I mean, some stuff-- it was not possible before. Processing audio, doing intelligent decision-making-- if I a question, I want to have an intelligent answer back, not just what Wikipedia says.

Or actually be able to produce speech-- I don't think people realize that when computers talk back to you-- you may have noticed in the last few years, it's gotten really good. It's really good at sounding like a human. That's actually because of a new neural network technology called WaveNets. WaveNets actually take in, basically, a robotic-sounding voice and output a nice-sounding voice like you and I.

JON: Oh, I think I've actually heard of that. It's like, you read off roughly around 200 lines, and it starts to imitate your own voice so it can read back to you.

IAN: And this is exactly how text to speech services work. And they have a lot of applications, not just the home speaker device, but even in businesses and call centers and how we talk to computers and through automation systems. If they sound natural, people want to engage with them.

And it goes right to the bottom line of engagement score and a SAT score for business. So this is one of the new hot areas in AI. It's one of the big demand generators for our GPUs, because you just can't run wave nets on legacy platforms. They're too big and too complicated. And they're too important not to get right for those kinds of services.

KARI: Yeah, and you asked exactly what kind of speed ups. And it does depend on the model. And we call it finding your quality of service. So if you have these ensemble models or an output of one model feeds into the input of another model, and they all have to be done in like, this enormous cosmic power and itty-bitty living space of latency, and so if you add all that latency up, it has to have that delightful end user experience. So if you speak into a search query and then it text to speech back to you, synthesizing speech back to you, that all has to happen in nice little tiny, window.

IAN: Yeah, so latency matters. If I'm going to ask a question, I can't wait a second for an answer to come back. I just won't use the service. I won't use it on my phone [INAUDIBLE].

So you have to deliver responses you know, in a tenth of a second, otherwise you know, after that, the enjoyment score starts to fall off and people stop using it. So we are seeing through puts and speed ups in the you know, 10 to 100x by moving onto GPUs, especially for some of the new models like wave nets, for natural language processing like the new model called Bert.

MARK: What is this?

IAN: Yeah. These guys come up with great names! I encourage anyone listening who's actually working on data science--

KARI: An Ernie.

IAN: We need a Ernie. So Bert stands for-- Kari, can you do it?

KARI: Bi-directional encoder representations in transformers.


MARK: Neat. What does that mean?

IAN: Bert is the new neural network that's used to understand language. Language being what we call natural language processing. You give it a web page. Please summarize that web page. Understanding text and contents and languages.

As I'm speaking, understanding what I mean and what's the sentiment, and summarize it. This is very important for things like advertising, for search. I need to know is this tweet a good tweet? A bad tweet?

Who should I share it with? Is it positive toward so-and-so? Or negative?

It's the new model that has come up. It's been talked about as the grand unifying natural language processing model that can understand all language. So they have many different models for different use cases. This is the latest one.

KARI: It's a great multitasker.

IAN: It's huge. It's gigantic. But it's very intelligent and very powerful. It's being used already in production today with certain companies for doing things like sentiment analysis, content filtering.

Is this is a safe thing to share? Not a safe thing to share? A very important part of AI today. And one of the driving forces for moving AI to GPUs.

JON: Our listeners obviously can't see me, but I have this huge grin on my face. So many light bulbs are going off in my mind right now for future projects. And this sounds completely awesome. So curious availability for the T4?

IAN: It's available today. So people can rent it right now from GCP. So if you're an on-prem customer, you get it also available on servers. We're available everywhere. You know, our mission is to make our technology available through every channel.

The cloud is a great time to market, especially GCP. So we're excited that you guys are leaning in and getting our latest technology out there. We also make our software available. So we do a huge amount of work. And people don't always understand this.

To accelerate the frameworks, the AI frameworks like TensorFlow or the rest of the inference software stacks that users of the cloud can very quickly deploy these models or build services on top of it. We publish optimized framework containers about every month now across the board, TensorFlow, but also PyTorch and [INAUDIBLE], and the other guys. We also publish a bunch of our inference software, which Kari can explain more, to allow customers to just not have to build AI from scratch, but start from standard building blocks. And all that runs on GCP right out of the gate.

KARI: Yeah. I guess so when Ian, you said that T4 has a little bit of everything for everyone-- FP64, it's got FP32, you get FP16, you get [INAUDIBLE]. But our software is built-in so that you can actually flip a switch and sort of exploit these features like Tensor Cores inside the hardware. And I want to say that FP16 is a great format for inference.

I've mentioned earlier that you have this mixed precision for training. But you don't actually have to have that for inference either. You can literally just flip a switch and know that you'll get the same sort of accuracy. With INT 8, there is a little bit more consideration that has to be taken in, both with your training and quantization of these weights and activations. But you do get benefits of memory space and power limitations when you actually take it to sort of this reduced precision.

And we offer software that can do that for you. And one thing that we did is this software framework called Tensor RT. And we've integrated it into TensorFlow. It's a lot of tensors going on.

MARK: It's all right. You had GTC where GCP. It's all fine.

KARI: Yeah, exactly. So what we've done is we've taken that software that takes the best advantage of optimizations, and put it right into where people do most of their work, which is TensorFlow.

IAN: And then we push all this software on the web free to download as containers. They can pull from--

MARK: [INAUDIBLE] seem like Docker containers. And like, it's all standard.

IAN: It's all standard. It's all in docker containers. In fact, we also have examples of all the models. So they don't need to download several models from everywhere. It's all captured in the containers as a straight up building block to fire up and run and create a service built on these things.

So if you want the latest Bert that's been optimized for GPUs or latest Resonate 50 or whatever have you, you know, feel free to come to NVIDIA.com. They're on-- it's actually NGC.NVIDIA.com. You can download all those bits and pieces. They all are qualified and run on GCP.

We even have some HPC stuff on there. There's some base CUDA containers, if you want to actually get your hands dirty and program GPs directly. All that software is freely available and maintained and ready to run on GCP's cloud. Again, the URL is GC--

MARK: We'll put a link in the show notes. Make that easy. Saying URLs is complicated and hard.

KARI: Yes.

IAN: You know, we're seeing that shift from on-prem to cloud conversation happening all sorts of places, not just obviously, in the AI world, but also on the traditional computing space. The high performance computing guys, the people doing you know, simulations for drug discovery, or understanding astrophysics and how galaxies and stars-- we do a lot of that work too. And actually, all that can be done also on the same GPUs in the cloud. So you know, some of the research community, the hired guys, people doing science and discovery can take advantage of these GPUs as well.

JON: That's awesome.

MARK: No, that's very interesting. Now I'm actually kind of curious. You brought the on-prem discussion in, which is always interesting. Where do you see people using the cloud versus on-prem, and like the trade therein? And like, what helps people make those sort of decisions for using GPUs on servers, essentially?

IAN: A lot of this new AI technology is being invented the cloud. That's what makes it so exciting. And what I love about the cloud is the accessibility. You just can get access right away. And to try it, to use it, it's a few bucks an hour, or even cents an hour to get access to a GPU, and start using GPUs for AI to or CUDA or whatever, you don't have to go through the IT.

We do see trade-offs at the larger scale business decisions. What makes sense to keep on-prem versus pushing the cloud? Some of that's economics for sure, but others you know, security, or whether or not they just want to maintain that as their own enclave that they can control, because it's really important for their businesses.

From our standpoint, we're agnostic. My job is to make our technology available through all the channels. And I certainly like working with GCP because I know how aggressive you guys are at bringing latest technology to the cloud.

KARI: One of the big deals for on-prem versus cloud and some of the work we've done is with the Kubernetes community. So making sure that GPUs is a sort of first class citizens, in it's almost completely vanilla the work that you're doing so you can have your Kubernetes cluster deployment on-prem, and then you can burst to the cloud with the same settings. We've done a lot of work there with being able to make sure that it's pumping out metrics of like, GPU utilization, power metrics, how are you-- that system administrator-- like, what is the health of my system?

And how do GPUs fit into that? We want to make it as vanilla as possible. We don't want GPUs to be viewed as like, this unique and beautiful [INAUDIBLE], which they are.

MARK: Of course! They're lovely. Don't get them wrong.

KARI: I did want to bring up a little bit the inference server. You kind of like these three basic blocks when you're sort of working through AI deployments. You have the pre-processing of the data, you have the training. And then maybe if you're really expert, you have this optimization and then deployment.

And I hate to use something as mundane as DevOps or something as exciting as AI, but that's kind of what it is. And I think some people who might get scared about where do I begin with AI model development, you just kind of take that same practice and approach to it. And we have tools that every step along the way for every person, whether you are data pre-processing, if you're deep learning data scientists and researcher, if you're the software developer who is trying to integrate the model into that application, and then the system administrator who is trying to take care of all this stuff and keep the health of the environment.

IAN: Yeah. So you can see we're pretty active at all layers of that software stack, not just the cutting edge of the actual development, new AI usage, but all the way down the system management and monitoring, and like Kari, said, you know, all the excitement around Kubernetes are definitely in there working very closely with that community to make sure that it's all GPU ready. The other area that's hot right now is the traditional machine learning. We talk a lot about AI, but you know, a lot of that math, it's reinvigorating data science.

You know, there's a lot of algorithms out there they actually are well proven and very successful for doing predictive analytics that are with more traditional linear regressions, caving and clustering, gradient boost to decision trees. We've also started to accelerate those workloads. It's the similar kind of math and operations that we're doing before.

We've integrated with Google on some of your Cloud ML pipeline workflows to accelerate some of the algorithms we've been deploying for traditional predictive analytics, particularly, grading boots decision trees or XG boost, comes up a lot in forecasting. Why I know when I'm going to go out of stock of bananas and I'd ship more bananas, there's point of sale data for every retail store all around the world. And data scientists are drowning in data. They just lack the compute to actually make those kinds of predictions, and often in the time when that matter, which can be literally just hours.

So we're starting to see much more acceleration of traditional machine learning. We have a software stack called Rapids, which is bringing some of those algorithms, collection of algorithms, and GPU-ifying them, making available to people to integrate into their predictive analytics platforms, particularly we've been working with Google on accelerating those as well. And that's some of the stuff that we've been talking about here at GTC. And it's exciting to see not just deep learning, but all of the machine learning recognize the opportunity that GPs can bring to accelerating all those kinds of predictive workloads.

MARK: Where would developers, data scientists, anybody who wants to use these interesting things going to get started?

KARI: Everywhere. So, I say the--


MARK: That narrows it down nicely.

KARI: No, honestly, like Ian said earlier, the easiest places NGC.NVIDIA.com. We have a lot of containers. We have examples. We have models and scripts of exactly the recipes that we have applied to get the best performance to the state of the art accuracy for all of these models. Like Ian said before, we work really hard to put this into open source software as much as possible. So like, if you're using a GCP VM, we actually stay in contact with their BM team to make sure that they're getting the latest optimizations that we've contributed to TensorFlow so that they're working perfectly on our latest GPUs.

IAN: We've also been providing educational services as well. So our GDC conference we host many hackathons. We've host many tutorials. We have our own Deep Learning Institute, which we called DLI.

Frequently, customers come to us and they want to learn about AI. So we have a program where anyone can come in and sign up. And we'll run a Deep Learning Institute at their company. We'll show them how to use the latest [INAUDIBLE]. We'll show them how to configure their GPUs to get them started.

They'll actually train on a neural network, go use it, and by doing sort of the canonical use case of a toy use case, they can then turn around and go back to their companies and then go from being just a dev ops person or just a traditional data scientist, to being an AI expert and figure out how to apply these things to their business use cases. We're very active in that community. People obviously working with many customers that want to get started and GPUs, want to get started on AI, just don't know how. We see it our mission to help educate and help.

KARI: And I know we said it earlier, but the product manager I mean, who really cares about the end user and their persona and what they're trying to do with their job. And so if you're a system administrator, you can feel good about our contributions to Kubernetes. You can check out our data center GPU management utilities. And again, all of this is on NGC.NVIDIA.com.

If you're a deep learning practitioner, definitely check out Amp, when again, that's in the framework that you use most to take advantage of next precision training. We've also made available Kubeflow pipelines. So it examples directly into the Kubeflow pipeline repository across a variety of applications. We're entering one for medical imaging. We're entering one for intelligent video analytics, for transfer learning.

And then we're also doing sort of like the Hello World with ResNet-50, and deploying that on an inference server with Kubeflow Pipelines into Kubernetes. And then if you're the software developer who really doesn't know anything about deep learning models, you just know that you have to kind of put it into your software application to get this you know, say yeah, I'm using AI, definitely check our inference server that's also open source software.

MARK: One of my favorite questions to ask, what's been the most interesting or potentially weird or wacky thing that you've seen people do with AI or in NVA chips?

KARI: What's kind of wacky about-- you know, one thing that's gotten a lot of notice lately is the slippery face synthesis. It's got a lot of press and attention, and I think there is even a site that you can click on. Try and guess like, which face is real. That's been kind of the neat one lately.

JON: I've seen my face plastered on a few people, which is very interesting to watch.

MARK: It's fine.

IAN: There are all sorts of applications of AI that I've seen. I think some of the coolest ones are actually in agriculture. People don't think about it that much. But I believe there's a cow with a Fitbit on it. And they're trying to figure out the--

MARK: Getting your steps in?

IAN: Yeah. Trying to figure out the health of cows, improved milk output, understand when cows are sick. Using AI to understand herds. Very cool stuff.

There's actually a interesting work in strawberry picking. I didn't know this, but apparently strawberry fields need to be picked every three days. So it's not like a crop you grow and pick it. Like, you have to constantly be picking it.

And it's an area that is actually very labor intensive, and obviously, it's hard to find all this farm labor that can constantly come and pick it for three days. So there's all these cool stuff in robotics right now. There's a strawberry picker that's being developed with GPUs that has a little articulating arm and a camera. And so it's actually a really hard problem.

You got to figure out and do segment analysis, which pixels or which strawberries, even if they overlap. And what's the ideal place to grab the stem and cut it? You never one actually touch the strawberry, because that would actually potentially transfer funguses or molds, and that kind of stuff.

MARK: And you don't want squished strawberries.

IAN: You don't want to squish them. And you got to do it every three days. So there's some really cool stuff happening in robotics. And that I think is some of the next generation stuff that we're going to start seeing coming out of AI. Just very convincing, very compelling, very interesting robotics applications that are going to make our lives a lot easier.

JON: Being a gamer, I do happen to have RTX 2080. Just curious to see if we can see that same technology that is being introduced in RTX within the GPUs that are going to be in data centers?

IAN: Yeah. One thing we've always done is we've made our technology available on all of our GPUs. And that's made us so successful is the fact that anyone can get access to the latest in video tech, even if they're a gamer. Because many of the people actually inventing the next generation stuff, whether it be science or AI, they're probably gamers too.

So with RTX actually, all the Tensor Core capabilities are there, available. We're already starting to put into games. So in fact, there's work being done for a final frame rendering. So at the very end of a frame of a rendering of the frame of the game, we actually can apply AI to it to improve its visual quality. We actually spent a lot of work training the neural network to improve the image quality.

We actually render it in much higher resolution and then downscale. And teach in AI how to do automatic color correction to eliminate jaggies, to do anti-aliasing, using AI on every frame. Yeah, it's like millions of pixels, 60 times a second. We actually do a full neural network on every one of those pixels to actually improve it to do a final pass for AA improvement on your GPU. It's a really hard thing. I, mean that's real, real time.

JON: I can imagine.

IAN: That's why I think AI is going to affect and impact and improve every one of the applications of GPUs in every industry, not just agriculture, but gaming, you know, automation, robotics, the cloud. It's super exciting.

MARK: We are running out of time just a little bit. But before we go, is there anything that you want to make sure that these listeners hear about, or you want to plug? Any events you're going to be at? Or any new content that's been delivered recently that you want to make sure people are aware of?

KARI: Yeah, so I think a lot of our biggest event has already passed. But you can always visit our booth or you know, come see us when we're at the top AI conferences. Some examples are SEML, ICLR, CVPR. There's a lot of acronyms here. But if you know them, they're speaking to you.

IAN: We're also attending more of the obviously DevOps and cloud conferences. We'll be a Google Cloud Next. And we'll be at KubeCon and other places.

So reach out. Feel free to contact NVIDIA. Again, we're working on all of the stacks and the applications. So please don't hesitate.

JON: Well, that's going to wrap it up for us. We will really, really enjoy this. The light bulbs are still going off. Once again, I'd like to thank you guys for joining us on the GC podcast.

KARI: Yeah, thank you.

IAN: Thank you. Bye.

MARK: Thanks again to Ian and Kari for joining us on the podcast and telling us all the amazing things that are happening with the NVIDIA T4s, but also thank you so much to the NVIDIA team for hosting us at the NVIDIA office. It is a gorgeous office, if you ever get down that way.

JON: It is super amazing. And I am a big fan of dark tinted glass. So it is amazing.

MARK: Awesome. Yeah, so thanks so much for that. So before we wrap up, a question of the week. I know you've been digging into this a little bit, Jon.

JON: Just a bit.

MARK: So if people are interested in Stadia, and they want to learn more either as a potential consumer or a potential developer, where can those people go?

JON: So the best place to go would be Stadia.dev. It's the new landing page for Stadia. There, you can find all the general information that you need. And if you would like to sign up for developer access, the best place to go is Stadia.dev/apply.

MARK: Well, Jon, where are you going to be? What are you doing? What's going on? You do anything special?

JON: Maybe a couple of special things. I'm going to be at Cloud Next in a couple of weeks. I will also be at ECGC--


--in mid April. And I will also be at Vector 2019, which will be very amazing. I have also jumped on board to be at I/O this year. So this would be great, because it's my first I/O.

MARK: How fun.

JON: And I will potentially be at Unite Shanghai--


--which will be a great experience.

MARK: Oh, congratulations on working out what you're doing post I/O.

JON: Yeah.

MARK: So yeah, I will be at Cloud Next as well. I will be hanging out at the GC podcast booth. So definitely, if people are going to be at Next, come by the GC podcast booth. Myself will be there. Jon will be there.

KARI: I will.

MARK: A variety of hosts that we have will be there. Gabby will be there, et cetera, et cetera. So definitely come by.

We'll be doing some recording on the show floor and doing live episodes, all the usual good stuff. I will also be at East Coast Game Conference, just like Jon. And then I will also be at I/O as well. I/O is pretty exciting. We'll tell you more about that as we get a bit closer.

KARI: Yeah.

MARK: Fantastic. Jon, thanks so much for joining me yet again on the podcast this week.

JON: It's always a pleasure.

MARK: Wonderful. And thank you all for listening. And we'll see you all next week.


Thanks again to Ian and Kari for joining us, and also allowing us to have a wonderful visit at the NVINDI-- NVIDIA. NVIDI--




Mark Mandel and Jon Foust

Continue the conversation

Leave us a comment on Reddit