Virtual Machines with Scott Van Woudenberg
Scott Van Woudenberg spent sixteen years as a software engineer and engineering lead/manager before moving over to product management. He joined Google in 2012 as a Product Manager on Google Compute Engine, mere weeks before its public alpha launch at I/O.
He’s remained a PM on GCE, helping to guide and build the service into a GA product which has seen exponential growth every year since going public.
Cool thing of the week
- Kubernetes class on Udacity post
- Google Compute Engine docs
- Virtual Machine wikipedia
- Google Cloud Spin: Stopping time with the power of the Cloud - Google I/O 2016 video
- Google Cloud Functions docs
- Compute Engine Pricing docs
- Custom Machine Types docs
- Resize a Persistent Disk docs
- Autoresizing Persistent Disks in Compute Engine blog
- Google Compute Engine uses Live Migration technology to service infrastructure without application downtime post
- Google Cloud Platform Locations map
Question of the week
- Creating Datastore Backups docs
- Scheduled Backups docs
- Creating a Cloud Datastore backup in BigQuery docs
Where can you find us next?
- Mark will then be at Change the Game SF on Friday!
- Francesc will be riding the AIDS/Lifecycle and if you want you can donate.
Transcriptshow full transcript
FRANCESC: Hi, and welcome to episode number 28 of the weekly Google Cloud Platform podcast. I am Francesc Campoy, and I'm here with my colleague Mark Mandel. Hey, Mark.
MARK: Hey, Francesc. How are you doing today?
FRANCESC: Very good. Very good. Very interested with the main content of the week.
MARK: Yeah, we're moving away from some of the shiny things, sort of, stepping away from containers, stepping away from App Engine, going, like, let's just talk about VMs.
FRANCESC: Yeah, what is a virtual machine? How do you use them at Google? How are better ours than others? Etcetera.
FRANCESC: I think it's gonna be very interesting. We're gonna have--what's his name?
MARK: Scott Van Woudenberg will be joining us.
FRANCESC: Scott Van Woudenberg, yes, and very good pronunciation of that one, by the way.
MARK: Thank you. Thank you. I've been working on my pronunciation of names.
FRANCESC: Yeah. So yeah, we're gonna--we're gonna have him and then, afterwards, we're gonna have a question of the week that comes from one of our listeners, Noam Frankel, on--
MARK: He's looking at how we can connect Datastore backups to BigQuery.
FRANCESC: Yeah, exactly. So yeah, we're gonna--we're gonna end up doing that question of the week at the end of the episode. But before, we have the cool thing of the week.
MARK: Yeah, we have this really great new Udacity course from two of our team members, Kelsey Hightower and Carter Morgan, who have done a Kubernetes online university course.
FRANCESC: Yep, and it is pretty amazing, just because seeing those two people working together is really amazing.
FRANCESC: They're really, really fun. But also, basically, it's gonna take you from the basics to what is a container and how do you use them to generate--well, to create microservices to then create Kubernetes clusters, and then deploy microservices on them, and basically everything from scratch to having your own microservices running on the cloud.
MARK: Yeah, and best of all, it's free.
FRANCESC: Oh, yeah. Absolutely. It's absolutely free, which is awesome, and you can find a link on our show notes.
MARK: Yeah. Yeah, should be very interesting to get done.
FRANCESC: Yeah, I'm definitely gonna go through it just--I'm pretty sure I know most of it. I'm sure I will learn some of--some new stuff...
FRANCESC: But I think it's just gonna be fun to see them presenting.
MARK: Sounds like it, yeah. That sounds really, really good.
FRANCESC: Cool, so what do you think? Should we go talk to Scott?
MARK: That sounds like a great idea. Let's talk about virtual machines. We are joined today by Scott Van Woudenberg, PM on Compute Engine. Scott, how are you doing today?
SCOTT: Good. Doing well. Thanks, guys. Thanks for having me on.
MARK: Absolutely. Before we get stuck into what you do at Google and, like, the products you work on. Why don't you tell us a little bit about yourself and your role and all that sort of fun stuff?
SCOTT: Sure. So I'm a product manager on Google Compute Engine, which is our infrastructure as a service offering, which is part of our broader Google Cloud Platform. Before Google, I was about--spent about 16 years as a software engineer and engineering lead and manager, kind of, moving up the chain on the engineering side of the house before I decided to switch to product management. So I came to Google in 2012 to be a PM on Compute Engine, and I've been doing it ever since. And right around the time I joined, like, basically weeks after, like, a few weeks after I joined, GCP was the--announced publicly for the first time. We kind of announced our--we called it limited preview at the time, but it's basically our alpha launch at that year's Google I/O. So it's been a very exciting ride to ride the exponential growth curve that we've seen since we started and especially since we GA'd back in 2013.
MARK: Excellent. All right, wonderful. Well, thanks so much for joining us. Let's, sort of, start at the most basic level that we can. What is a virtual machine? Like, I'm sure a lot of our listeners probably have a bit of an idea, but, like, I think let's, sort of, go--let's go as simple as possible. What is it?
SCOTT: Yeah, so this is actually--this is kind of a tough answer--tough question to answer tersely, I guess, or succinctly. I spent a little time, kind of, researching and took a look at what Wikipedia had to say about it, and, kind of, the most basic definition I could find was essentially, "A virtual machine is a software implementation of a physical computer or server." So--and then, obviously, on Wikipedia there was--it was followed by a few pages of, like, fairly low-level technical detail on virtualization, on hypervisors, on paravirtualized devices, on--the list goes on and on. And then, you know, along with about, you know, two dozen reference links down at the bottom where you could read about more detail. So I kind of stepped back and thought about it and I think, you know, for the purposes of this discussion and just, kind of, for Cloud customers that are looking at GCE or comparing it against EC2 or just trying to understand. Maybe you come from an App Engine background, or you're a front-end developer, or, you know, for, you know, depending on where you coming from, it's best just to, kind of, think of a VM as a trusted, ubiquitous computing environment that lets you decouple the actual code that runs from the hardware that it runs on. So think of it as an abstraction layer that is both secure and ubiquitous, which is--which are kind of the two things that have made virtual machines, you know, as popular as they are.
FRANCESC: So like containers?
SCOTT: So containers is actually a very--that is a excellent example of a computing abstraction that is not yet very secure, and--but it is also not yet quite as ubiquitous as virtual machines. There will be a day, sometime down the road, where all of the security issues get worked out. Like, you can have complete isolation between two containers running on the same host. And they'll be a very secure, trusted sandbox, is kind of how we think about these environments is computing abstraction layers run in these sandboxes. Kind of like a JVM. Kind of like a, you know, like any interpreted language has, like, a runtime environment, and, you know, that runtime environment, that sandbox, has to be proven to be secure and, you know, it's impossible for one container to break out of that sandbox environment to muck with the memory or, you know, other area--a different container's environment, so that you can have a true multi-tenant computing platform, which is really what VMs gives you. And the reason why VMs is where it is, is because VMware started this whole game, you know, 12--10, 12 years ago and has basically, you know, taken the--taken the market by storm. Back in the day before VMware, everything ran just on--directly on hardware, and VMware added this virtualization abstraction and it really took off. But, you know, it's been secure. Like, they've been finding all the bugs and making sure that it's secure for the last 10, 12 years. So containers is just, kind of, at the beginning of that curve, I think.
FRANCESC: So--and--other than [inaudible] security concerns that you were mentioning, why could someone choose running a VM, using Google Compute Engine, rather than App Engine or Kubernetes or Container Engine?
SCOTT: Yeah, so, I mean, the simplest answer is because that's what, like, enterprises--anybody coming to the cloud that has existing workloads, either on-prem or on AWS or elsewhere. It's gonna be, like, Compute Engine or, you know, Infrastructure as a Service, is going to be the most familiar and, you know, probably most like what their currently running. Especially enterprises where most of 'em are running on--in their data centers or in Colo environments or whatever, they're probably using VMware. The majority of them are using VMware, running virtual machines. It's what they're familiar with. This is basically--infrastructure is going to be most customers' onramp into the cloud. And then, once they're on the cloud and they're out of the business of managing their own data centers, like, doing their, you know, three or five year hardware refreshes and dealing with all of that stuff. Once they've moved those applications to the cloud, onto GCE, from there they'll then be able to start decomposing, rewriting, rewriting components, kind of, splitting them out into microservices and taking advantage of some of the other really cool technologies that Cloud Platform offers. And not--our Cloud Platform obviously offers some really amazing, especially big data analytics machine learning. We've got a ton of really amazing products that abstract--that are built at a higher level of abstraction. Like, the machine learning is basically an open source API. So you don't actually have to manage underlying resources if you don't want to. That's the--and Kubernetes is a great example of that, where you just write your code and, you know, write your containers, like, what you want to implement in a container deployed as a microservice and deploy dozens of other ones, and they all start talking to each other. That's really, kind of, the power there. You don't have to think about, like, the hardware, how much RAM, how many cords. Like, am I using it efficiently? All that other stuff. You just stop thinking about it. And that's the advantage of moving up the stack. But as far as getting workloads onto the--into Cloud, the first step, the onramp, is gonna be GCE.
MARK: Apart from that, sort of, general onramp to come into the cloud, are there any particular use cases you see that people are like, "Oh, VMs are just much better for this." Than, say, maybe some of the more shiny technologies out there?
SCOTT: So I think, you know, the--it really, kind of, depends on what people like to focus on. Like, you know, at the end of the day, virtual machines give you absolute control over every layer. You know, every piece of the software you control. You are the one that writes all the code to do, like, auto scaling up and down and, you know, basically, scaling your--the number of VMs that you have based on the incoming demand and, you know, requests per second. Whatever the--whatever the metric is. So really, for the very sophisticated customers or applications that need control over every aspect of their--of their application, virtual machines is probably gonna continue to be the go to product, or go to technology, for some of those. There's also stuff like, you know, if you have to run something like SQL server or something like a third-party software package that doesn't really have, like, a containerized version. Like, or it doesn't--it's not available, like, to an App Engine app, like, you know, like a video encoding library, for example. Like, we have tons of App Engine customers that basically use App Engine to orchestrate--to do the front end serving, but then also to orchestrate, like, kind of, transcoding workloads, like, media rendering, that sort of thing. And, you know, that's where they basically break out of the App Engine layer, use GAE, App Engine Flex, to run GCE VMs but still managed by App Engine. And those VMs, the only thing that they're doing is pulling, you know, videos or audio off of a queue and transcoding it. Because you can actually run lib, you know, lib, mm, peg, and the other transcoding libraries, which are typically written in C, you can just run those into VM, where you can't really run those in App Engine without rewriting them as a Python library or something like that, which is just too much work.
FRANCESC: I think you just went exactly through the architecture of the--of Cloud Spin, presented at Google I/O. It was exactly that.
SCOTT: Yep. Yep, it's a very popular--very popular model, and App Engine makes it really--App Engine plus App Engine Flex make it really easy to manage the VMs. You still--you know that they're there, but it handles all of the orchestration and so you don't have to. And it just makes it super simple.
FRANCESC: So I have one more question regarding this trend towards containers. So you said that for VMs, this is, like, a very good place to--when you're doing my favorite phrase, lift and shift, to get started on the Cloud. Then, some people move to containers, and what is your opinion on this trend towards serverless computing? Taking into account that all your business is selling servers.
SCOTT: So serverless is--I mean, there's a couple interpretations. So, like, one interpretation of what that means is, "I just write code. I don't have to think about, like, the underlying deployment orchestration. I don't have to think about, you know, how that code gets scaled up and scaled down to handle increased load like the diurnal cycles, that sort of thing." So it basically just means that I--the product or whatever it is that I'm programming against has abstracted away the fact that under the covers, there's servers that are connected over networks and stuff like that. So that's, kind of, one interpretation. Another one is just, "Look, I just have, you know, like, kind of, like, the Cloud functions model or the AWS lambda model, where you're just writing, like, these little snippets of code that you need to run as--when some trigger comes in, you know, like, I don't know, like, "I've just pushed a new request onto a, like, an entry onto a task queue. I want that to trigger a cloud function to run and pull that off the task queue and do something to it, and then push it back onto a different task queue." Or something, and that's really where, you know, something like Cloud functions. You really aren't even--you're worried about even less. You just write a function. Essentially, what is a--what ends up being like a function in whatever your favorite language is, and then, that--you hook it up. You have--you set up a trigger and then you--that function goes off and does something, which may trigger other functions to run, which, you know, gives you this ability to, kind of, just write individual functions, hook them all together, and you end up building, like, these, kind of, crazy-complicated applications that never have to touch, you know, never have to even think about, like, the notion of even a container. Like, it's completely abstracted away even beyond that.
FRANCESC: Cool. So since we're talking about Compute Engine, and you are pretty much the expert on the topic, what are your favorite features of GCE?
SCOTT: So I think that, like, a lot of my favorite features come from, like, kind of, the underlying, flexible infrastructure that we're built on. So, like, two of my--two of the best examples, and my personal favorites, are the per-minute billing that we have and then the automatic sustained-use discounts. So per-minute billing, you know, most of the big hyperscale Cloud offerings right now bill per hour. So if you spin up a virtual machine, you run it for, like, 20 minutes, and then you shut it down. You get charged for an entire hour. For Compute--on Compute Engine, you only get charged for those 20 minutes. And then, if you start it up again and run it for, you know, another 15 minutes, then you only get charged for those 15 minutes instead of it being rounded up to an hour. Now, we do have a 10-minute minimum. So if you spin it up for three minutes, you get charged for 10 minutes. But really, you know, that's--it's pretty significant savings if you're--if what you're doing with the VM doesn't typically take, you know, on the order of multiple hours. And, you know, obviously the longer the VM runs, the less of a--less of impact--less of a savings you get. But that's where sustained-use discounts come in, which is the longer you run the VM for a given month, the cheaper that VM--the per-hour--the per-minute cost of that VM is. So if you run a VM 24/7 for an entire month, you basically get, like, 30% discount off of the list price for that entire month. And we just, kind of, at the end of the month we aggregate all of your VMs up and do this--do the math to, basically, give you a discount over all of the usage based on how long those VMs ran for that month. And that's all--especially the per-minute billing, that's really, kind of, comes down to, like, some of the, like, the flexible, real-time data processing infrastructure that we have here at Google that lets us do, you know, near real-time billing and monetization like that, which is, you know, when you're talking about, you know, the number of VMs that GCE runs on behalf of their customers. Like, doing per-minute billing for all of those VMs all the time is actually a pretty significant data processing challenge. But that's the type of thing that Google's good at, and so we've leveraged that--some of that inhouse expertise to solve it. Another of my favorite features is custom machine types. That's basically, when you go to create a VM, you typically specify an--what's called an instance type or a machine type, and that, basically, is, kind of, the hardware spec, like, the virtual hardware spec of the VM. So it's the number of cores you want, how much RAM you want, and, basically, what GCE offers, that none of the major competitors offer, is the ability to customize that. Like, "You know what? I don't need, you know, I don't need eight cores, but four cores is too small. So I'm gonna create a six-core VM, and I happen to know that I need two gigs per core. So I'm gonna create a six core, 12 gigabyte of RAM virtual machine using custom machine types." And that lets me save money over the next highest up. So actually, let me take a quick step back. Like, we also, like, GCE, as well as the other major public clouds, give you a selection of predefined instance types, or machine types. So, you know, Google calls them, like, n1-standard-1, n1-highmem-4, etcetera. Now, the standard highmem, highcpu, really just refers to the RAM to core ratio. Competitors offer, like, AWS offers the c3.4xlarge, c4.2xlarge, etcetera, etcetera. So there's the predefined machine types that you get, or that you have to choose from, but if your workload doesn't fit exactly into, like, it's too small for the--for, like, a c4.2xlarge but too small for a c4xlarge, or too big for a c4xlarge. You know, there's no--you basically have to buy the next size up or you have to figure out how to tailor your workload so that it fits exactly into one of those predefined shapes. And that's where custom machine types really shines is the ability to tailor the VM shape to the workload not vice versa. And that's especially important for enterprises coming to the cloud because they've probably--what they're probably running on is, like, a Dell server that's got some number of cores and some amount of RAM, and that's, basically, you know, their homogeneous fleet that they've run--they're running right now for all these different applications. And instead of having to, kind of, scratch their heads and figure out how to, you know, adjust the workload so that it fits in with one of these predefined types, they just go create a custom machine type that exactly mimics what they have on-prem, and they're off to the races. It's one less, you know, kind of, pain point or friction point in moving to GCE from an existing on-prem environment.
MARK: And that would see to me, as well, like, a really nice cost savings as well. I know--I do some stuff with gaming companies. And so often they want, like, necessarily a whole lot of RAM, but they don't necessarily need, like, the fastest CPU in the world.
MARK: So when you're in that, sort of, cookie-cutter world, and you have to have both of those at the same time, this is a nice way to say, "Hey, no we don't need that much CPU. That's fine. We've got lots of RAM." So we can really tailor--spend our money exactly where we need it.
SCOTT: Yep. Yes, exactly.
FRANCESC: Yeah, there's one of the things--you've been talking a lot about the flexibility of our infrastructure. And one of the places where I think it shows is the fact that we're able to resize persistent disks while the instance is running and the disk is attached to it.
FRANCESC: That is something that the first time I showed it to a customer, they were just impressed with that. It was like, "How is this magic? How does that work?"
SCOTT: Yeah. Yeah, that's funny. That was something that we launched fairly recently, and I don't think it's--I don't think it's really, kind of, hit the, kind of, customer psyche or whatever you want to, kind of, collective customer psyche. Because it is a super powerful feature, and the customers that have encountered, you know, there's a lot of issues that happen that could cause your, like, especially, like, your root boot partition to fill up or, like, your boot PD. Like, typically, we only allocate, like, you know, 10 gigs worth or 20 gigs worth. You keep those small 'cause you don't put any data on 'em and stuff like that. But still, you know, any--most guest OS dump, like, all of their logging information on there, etcetera. And so if, for whatever reason, you're not paying attention, and you don't have any alerts set up, and the root volume fills up with, like, log data or something. Then, all of a sudden, you're now no longer able to SSH into your VM because the SSH relies on being able to write its own logs. And when that fails, like, everything fails, basically. And that's--this is, like, common. Not just on--this is, basically, any Linux, any, you know, server that runs. If you fill up the root partition, bad things happen. And this hot resize is just a super easy and quick way to be like, "Oh, crap. I had a runaway log script or something and my volume's filled up. Let me just go into the UI, click two buttons, and double the size of the PD--root PD volumes. So I can go in and, you know, fix my spammy log file or fix whatever it is that's filling up the root partition." Like, it's just--it's such a convenient and powerful mechanism. And it's not just related to root volumes, obviously. Like, if you've got data volumes that, you know, let's say you use GCS connector to pull a bunch of data from GCS to a local VM to do--to run a bunch of VMs PDs to do a [inaudible] job, and all, you know, you're writing out the results. And all of a sudden, the job fails because the data disk that you were writing out to is full. You know what? Like, you just go resize--again, just go resize the disk and rerun the job and all of a sudden you--you're good to go. You don't have to stop the VM. You don't have to shut everything down and detach it and resize it or create a bigger one and copy everything. It's just--it's just super, super powerful.
FRANCESC: Yeah. You could even consider doing that through the API.
SCOTT: Yeah. Yep.
FRANCESC: So you don't even need to even just click a single button. Just everything happens.
FRANCESC: Which is amazing.
SCOTT: Set up a--set up a stack driver alert that triggers a, you know, stack driver alert that when it gets 90% full, you know, call a cloud function that calls the GCE API that resizes the disk up, doubles the size of the disk.
FRANCESC: So many cloud things in that sentence.
MARK: I'm pretty sure our teammate Terry has a little script that he can--he demoed, showing off doing exactly that. Where it keeps track of how much data you've got and how much free space you have. And then if it runs out, it's like, "Oh, I'm just gonna double your partition, here."
MARK: It'll be fine.
SCOTT: But you might want to look at this at some point to figure out why it's filling up. But in the meantime, just go--just go fix it.
MARK: Yeah, in the meantime.
MARK: And that's really cool. And I think that touches on some other stuff too. Like, for example, we can do a bunch of stuff to your infrastructure as it's running. Like, do you want to talk a little bit about live migrations and how that works?
SCOTT: Yeah, sure. So, you know, the--one of the biggest challenges for a cloud provider is keeping the software and hardware infrastructure that all the VMs run on, keeping it up to date. Keeping it patched, secure. Rolling out new versions of all the different layers of that stack, etcetera. And, you know, if it's something significant like upgrading the host kernel that is running on the server that your VMs are running on, like, obviously, that's a very, you know, disruptive action for VMs or anything that's running on that host. You basically have to, like, you know, take everything off the host, what we call drain. You have to drain the host and do the upgrade and then reboot it, and it comes up on the new kernel. That's all, like, and so what--within that reality, in the reality of maintaining these systems and keeping them all up to date and patched, what we've done for GCE is built this technology called live migration. This is not like, you know, we didn't invent the technology. Like VMware has been doing, you know, their own version of live migration, they call it vMotion, for many years. But we're, like, the first public cloud, hyperscale provider to do live migration at the scale that we're doing live migration. Like, obviously, we run lots and lots of VMs across many, many zones, and we use live migration quite a bit to make sure that all of our underlying systems and--are patched and updated and all without disrupting the VM. So what a live migration does is it, basically, your VM is running on one host machine. And let's say we got to take that host machine offline to do a kernel--host kernel upgrade. So what we do is we, you know, trigger live migrations for the VMs that are running on that host machine. And what a live migration does is, while the VM is running, it starts copying the state over. It creates a new, like, virtual machine target on a different host, starts copying all of the state, like, RAM, the VM--the specification for the VM, the machine type, all of the different, you know, PDs and things like that, network connections. Basically, creates an exact mirror of that virtual machine on a different host, and then, once we've copied the majority of the state, we pause the source VM briefly and then unpause it--copy the rest of the remaining state. And then unpause it on the target. And now--that VM is now running on a different host machine without any impact--well, you know, with minor performance and a very brief brownout--very brief blackout window which is what we call that pause period that it's actually paused doing that last transition. And, you know, that's--for most applications, like, that pause has--is completely invisible. Like, and the--there is, like, very little performance degradation for the majority of our virtual machines. And so most customers just go--don't even, you know, their workloads don't even notice that we've done this. And, you know, it's very, very powerful compared to the alternative which is, essentially, scheduling reboots and, you know, "Oh, we need to roll out a security patch for the, like, the latest heart bleed vulnerability or whatever." So that required, you know, our competition to reboot, like, a significant chunk of their customers' VMs. And so there was, like, this huge flurry of activity if you're a customer there that--where you had to go, basically, figure out when the best time you were gonna be able to reboot in the next few days. Because the machines that were--or were running on--that your VMs were running on were vulnerable. And Google did the same thing. We patched all of our host machines. But we did it all using live migration. So we didn't have to require any reboots.
MARK: Excellent. Well, that sounds like super interesting stuff. I have to ask this, though. Feel free to dodge the answer. I totally understand.
MARK: What sort of new features could we possibly be seeing for Compute Engine? I know we had Next recently, and we had a bunch of announcements there. But can you hint at anything? Can you touch on anything? I wouldn't be doing my job if I didn't ask.
SCOTT: Yeah, sure. I can--I can talk in general terms. Like, so at--as you mentioned, GCP, the most recent GCP Next that we had down in San Francisco, we did talk about the fact that we're gonna be adding more regions and zones over the next few years. And so, you know, we are very serious about, like, continuing to expand our footprint across and into more regions that are--that our customers are asking for and that provide, you know, the locality that different workloads require. And so the first two of those areas that we're expanding to is gonna be Oregon and Japan later this year. And then, you know, obviously, in 2017, there's gonna--we're gonna be expanding into even more areas, more regions. So we'll be opening up new regions and zones. And that's gonna be a huge--that's gonna be a huge benefit to customers so that they can run their workloads where it makes the most sense and is where the--near where their users happen to be. So that's one of the big things that we're doing over the next couple years. And, without getting too specific, you know, we are also working on features that make it easier to troubleshoot your VMs. You know, kind of like that hot grow PD thing that we talked about. Like, you know, if something goes wrong with the VM when it's booting up, like, it's often very challenging to figure out what went wrong. And so we're working on tools to make that a lot easier. Again, think of the hot grow analogy where I just click a few buttons or call an API and it's solved. This is somewhat similar. Like, you try to make it as simple as possible to figure out what went wrong and fix it without having to do the whole dance of, you know, deleting the VM without deleting the disk, attaching the disk to another VM, figuring out what's the problem, and, you know, basically, there's this whole sequence. It's a real pain right now, and we want to make that easier. And then also, the other--another thing we want to--we're working on are, kind of, more enterprise-friendly pricing and usage models. You know, enterprises have quite a huge spectrum of requirements when it comes to how they do budgeting and how they figure out what they're gonna spend their budgets on and that sort of thing. And so more enterprise-friendly pricing models are definitely something that we're working on to be able to make it easier for more enterprises to easily shift stuff over to Cloud Platform. And then, obviously, we continue, you know, doing the--we'll, like, start turning the crank on the bigger, faster, cheaper, you know, computing. So that's--those are, in general terms. And if you're, you know, anybody out there who's evaluating GCE, or GCP in general, and you want to hear more about some specific details. You know, please reach out to me on Twitter, and we can have a much more low-level chat and much more detailed chat under NDA, so.
FRANCESC: Cool. Well, I think we're pretty much running out of time, but is there anything else that you would like to mention, some topics that we might have missed?
SCOTT: I think the--like, the biggest thing that, you know, I want to make--if I could send everyone away with, you know, one to do or one action, I would say check out cloud.google.com you know, if you haven't yet. If you, you know, this is the first time you're hearing that Google has a cloud platform, hopefully not because, you know, you've got some regular podcast subscribers and so people are familiar with it, but tell your friends. Check out, like, our website. There's a ton of amazing products that we have. Data and analytics, machine learning, your basic GCE, your App Engine which is just write your code and deploy it and forget about it, and pretty much everything in-between. And it's a very comprehensive platform, and we're really, really pleased to see the growth that we're seeing. But we would love it if--we'd love more--always love more customers. And so please check it out.
MARK: All right. Well, thank you very much for joining us, Scott. It was an absolutely pleasure to talk to you.
SCOTT: Yeah, it was great talking with you guys too. Thanks for having me again.
FRANCESC: Thank you. Thanks again to Scott Van Woudenberg for such an interesting conversation.
MARK: Yeah, I really enjoyed it. It's good to see, you know, people still talking about the basic building blocks of the cloud.
FRANCESC: Yeah, virtual machines are a real thing even though they're called virtual.
FRANCESC: So yeah, very interesting. Let us know if you have any questions. We have the contact information for Scott on the show notes, and apparently, he's very keen on getting your questions and opinions.
MARK: Yep. Yep, and we're always looking for more questions of the week too. So, you know, that.
FRANCESC: Absolutely, yes. Yes. And talk about questions of the week. Let's go with the question of the week that this--for this episode comes from Noam Frankel, and he says, "Hi, Francesc and Mark. It's a great podcast you have..."
MARK: Thank you.
FRANCESC: Thank you very much. And basically, he says he listened to the episode on Humble Bundle, and we mentioned--so Andy Oxfeld mentioned that they were backing up Datastore to BigQuery via Google Cloud Storage. And he's, basically, asking, "What is this magic? How does this work?"
MARK: Right, so I had a look into this. It's actually kind of interesting. There have been ways to back up Datastore for a long time. A lot of them have been fairly manual. So we've got some links to the show notes. There is a way you can go in and enable the Datastore admin, and then, go from there to back up all of your Datastore data in one big go. There's no incremental backup, but you can take the whole thing and say, "Okay, I want these particular kinds within Datastore." And then store--just put those in a storage bucket. And from there, once you've got that, it's relatively trivial to be able to point at that Datastore backup inside Cloud Storage and say, "Hey, BigQuery, can you ingest this, please?" One of the cool things to note is that you can do scheduled backups. They are full backups, though. So keep that in mind with your quotas and how much you want to be reading and writing from your Datastore. So you can do it with a cron from--specifying a cron file on App Engine. Like, you can totally do that. It's a little bit of custom code, yeah, but you can get that up and running, depending on how often you want to be doing these full backups of this data. But yeah, then you can take those backups, you can ingest those into BigQuery, either through the admin or through the API, and then you can access those--that data from BigQuery and do your analytics and all that sort of fun stuff.
FRANCESC: Yep, nothing really fancy really, just a bunch of things that you need to plug on to each other, and then everything just works.
MARK: Everything should just work. So yeah, I mean, like, the admin will create jobs for you and run through, like task queues and do all that sort of stuff in the background. It's worth noting--yeah, like I said, it's not--it's not incremental. It's full backups. So if, for example, you're getting multiple kinds, they could be out of sync a little bit. So, you know, it's not gonna take a complete snapshot. So you, sort of, have to take that into account. But for stuff you're pointing to BigQuery, I'm thinking that's probably less important.
FRANCESC: Yeah, I guess that if you really wanted something that could do it weekly or daily, or something like that, you could always build your own thing that could generate that, like, could just create just a query of all the data that was created during the last day or so.
FRANCESC: But you're gonna have to do more manual things. If you--everything you're going to do is the whole thing, and just dump the whole thing onto BigQuery, that sounds like a good idea.
MARK: Yep. In some cases, this also could push towards, like, a good argument for using things like Pub/Sub.
MARK: So that, you know, when you go to store your data, it goes to your data, but another subscriber maybe picks that data up, shoves it off into BigQuery as well, and then you can keep that data alive in there and have multiple things do stuff based on those actions. And Pub/Sub's a really great fit for that.
FRANCESC: Yeah, absolutely. Cool. So before we finish, why don't we discuss--why don't we talk about our next plans. What are you gonna be doing?
MARK: What am I gonna be doing? So this week, we still have Change the Game here in San Francisco, very excited for that.
FRANCESC: What day is that?
MARK: That is a game conference we're running here in San Francisco. We're gonna have a bunch of customers come by, some Googlers. I am co-presenting twice. So I guess that means I'm doing one presentation. So talking about multiplayer games, analytics for games, all sorts of cool stuff. It's gonna be a really fun event. I know there's gonna be a bunch of Googlers around as well. I will be there, which I'm sure just makes it extra, extra special.
FRANCESC: I really like that my question was actually what day is that, but you don't understand me, so.
MARK: Oh, what day is that? I'm sorry. I thought you said, "What's going on there?"
FRANCESC: No, no, no. What day? What day? What day? What day? What die? What die is that?
MARK: Okay, okay. That is on Friday.
FRANCESC: Oh, cool. June third.
MARK: So it'll be a couple of days after the podcast comes out.
MARK: Yeah, after that, I'll be presenting at GDG Twin Cities for their meetup. I will be at Open Source North. I will be at dev.Objective(), and then I'll be in New York for a gaming panel towards the end of the month. I haven't got all the details of that yet.
MARK: Yeah, I'm doing some travel this month. It's gonna be crazy.
FRANCESC: Wow, busy month. Yeah.
MARK: Yeah, and then, I'm taking a little break and spending a week in New York. 'Cause why not?
MARK: I know you're preparing for your long biking ride.
FRANCESC: Yes, I'm gonna be biking from San Francisco to Los Angeles, and I'm leaving on Sunday.
FRANCESC: So next week I'm gonna spend my whole week on a bike. I'm very excited. Very excited, slightly scared, but I think it's gonna be fun.
MARK: Yeah, based on your level of sunburn, it looks like you're well prepared.
FRANCESC: Yeah. Yeah, that was from yesterday, yeah. Yeah, and after that, I'm gonna be taking a couple weeks of relaxing and writing stuff and trying to be more productive without traveling. And after that, I will go to GopherCon.
FRANCESC: Yep, but that's in July. So we have a couple weeks of recording podcasts before that.
FRANCESC: Well, thank you so much for joining me today.
MARK: Yeah, thank you so much for joining me as well, Francesc.
FRANCESC: And talk to you next week.
MARK: I'll speak to you next week.
Francesc Campoy Flores and Mark Mandel
Continue the conversation
Leave us a comment on Reddit