Rugby and ML with Capgemini

Brian Dorsey and Mark Mirchandani are back this week with guests from Capgemini as we learn all about ML in the rugby industry. Priscilla Li, Head of Applied Innovation, and Aishwarya Kandukuri, Data Scientist, start the interview explaining what they do at Capgemini and how the company uses new technologies to enhance projects with their partners.

When Capgemini became the official global innovation partner of the HSBC World Rugby Sevens Series, they were tasked with creating new ways for fans to use technology to further their experience. Priscilla and Aishwarya explain how they created a series of digital projects to accomplish this goal, and how the experience inspired them to use AI to automate aspects of the actual rugby games, such as identifying a scrum. They explain the challenges of these projects and how they conquered those challenges, as well as ways it has benefited the rugby commentators, players, and fans.

Later, they talk specifics regarding the process of tagging images and audio to use in AI projects and things they learned along the way. Priscilla and Aishwarya wrap up the interview with advice for others who may want to tackle a similar project.

Priscilla Li

Priscilla Li is a leader of Applied Innovation Exchange in the UK.  Her purpose is to apply innovation in ways that advance humanity with a team that is diverse in gender, thought, discipline and experience.  Together, they shape ideas and breathe life into them through the application of emerging technologies with a human perspective.  Priscilla has held leadership roles in innovation and technology, advising and implementing innovative solutions across industries in telecommunications, transport, public sector, and media.  As a founding member of Artfinder, funded by Silicon Valley and UK Venture Capitalists, she delivered the first image recognition technology to discover, share and sell art. In 2012, she was selected by Business Weekly as one of the top Cambridge entrepreneurs and received the Chairman’s Award for excellence at American Airlines. In the Applied Innovation Exchange, she continues to bring to life the art of the possible, collaborating with start-ups, academia, and the wider community to unlock new opportunities for growth and meaningful transformation.  Grateful for her journey, she hopes to inspire women to be pioneers, unencumbered by the reality of today, but energised by the promise of tomorrow.  

Aishwarya Kandukuri

Aishwarya Kandukuri is a Data Scientist in Capgemini’s Insights & Data Practice. Her role involves testing ideas and concepts by analysing data and building machine learning models using emerging technology. Aishwarya works with an interdisciplinary team to drive business solutions. She worked on various projects across the industries to apply Machine Learning concepts to solve complex business problems to meet the needs of the customers. She continues to seek innovative approaches and explore new technologies to achieve long lasting solutions.

Cool things of the week
  • From raw data to machine learning model, no coding required blog
  • Helping contact centers respond rapidly to customer concerns about COVID-19 blog
    • How can Chatbots help during global pandemic (COVID-19)? blog
    • Verily COVID-19 Pathfinder virtual agent site
    • COVID-19 Rapid Response Demo on GitHub site
    • Deconstructing Chatbots video
  • Recent Podcasts with Priyanka:
    • GCP Podcast Episode 188: Conversation AI with Priyanka Vergadia podcast
    • GCP Podcast Episode 195: Conversational AI Best Practices with Cathy Pearl and Jessica Dene Earley-Cha podcast
Interview
  • Capgemini site
  • Altran site
  • Rugby sevens partnership and technology site
  • AWS Kinesis site
  • AWS Fargate site
  • Applied Innovations Exchange on Medium blog
  • Emerging technologies in sports site
  • Applied AI within a Pop-Up store: a collaboration between Action for Children and Capgemini AIE video
  • To get the Quarterly Applied Innovation UK newsletter email
  • TensorFlow site
  • Firebase site
Question of the week

We talk to our friend Zack about how we could build something similar with ML! AutoML might be the way to go!

Where can you find us next?

Capgemini will be at more What’s Now London Events with topics like Disrupting The Field

Brian has been working on videos like Rethinking VMs - Eyes on Enterprise. He’s also been live streaming with Yufeng in Adventures with Yufeng on VMs.

Mark will be making more videos like Kubeflow 101.

[MUSIC PLAYING] MARK: Hey, everyone. And welcome to episode 216 of the weekly Google Cloud Platform Podcast. As always, I'm Mark. And I'm here with my friend and colleague Brian.

BRIAN: Hello, hello.

MARK: How are you doing, man?

BRIAN: All right. That's been my cap lately. I'm happy to have reached all right.

MARK: You know what? It is totally in a good place right now to be all right. And sometimes, that's just exactly what we need. Well, we have got plenty of good content for you. And actually, it's a very ML-themed episode on this week's podcast where we talk to Capgemini. We're going to talk to Priscilla and Aishwarya from Capgemini, talking to them about rugby and machine learning, two things that might seem very, very different, but two great tastes that taste great together. They give us a cool overview of some of the work they've done. And we talk about a sample application they built on top of rugby footage.

BRIAN: And keeping with that theme, we're going to continue onto the question of the week and see how you might be able to build things like that or other things around the data you have with machine learning with Zach.

MARK: So very, very ML-oriented. But before we get into any of that, let's talk about our cool things of the week.

[MUSIC PLAYING]

So the first cool thing I have of the week is-- surprise, surprise-- ML-oriented. Actually, I just saw this. And it's a very, very cool blog post, but it's actually kind of an outline and a tutorial of how to go through this great example of setting up a machine-learning model without using any code. So Google Cloud has a product called AutoML that helps you kind of generate a machine-learning model without having to write any of the code around it.

And Karl Weinmeister who I also worked with on the Qflow one-on-one videos, he does a lot of machine-learning stuff, has gone through and kind of detailed out, let's take this public data set. Let's walk through how to clean it up in BigQuery, how to get the data prepared, and then how to hook into that with AutoML. So it's actually a super cool exercise. And I highly recommend it. Especially if you're kind of not as familiar with this ML stuff, this is an incredibly good introduction to it. And if you are super familiar with it, then this could be a great way to kind of open up your mind to new perspectives of how to build ML models and maybe looking at, OK, well, here's another cool tool that I can have in my wheelhouse.

BRIAN: And the next one definitely has ML involved, but it's a little more behind-the-scenes and topical to the times. It's around COVID-19. And all of us just have so many questions. And then they get more and more specific as you get closer and closer to businesses. So Verily, who's part of the overall Google Alphabet umbrella, working with the Cloud folks and some others, have created a set of templates for how to make chatbots around COVID-19 questions and pulling from actual data from the CDC and trusted sources. So that's pretty interesting to take a look at.

And one of our DAs, Priyanka, has written a really nice overview of how to use the chatbots for that, but then also kind of extend it to things that might be related more to your business or to your family, your health, some other things there, so you've got the overall big picture from government and health care organizations. She mentioned some other areas like mental, financial well-being related things, some transactional stuff.

Restaurants are having a challenge adapting to-- they're mostly moving to delivery or pickup. Chatbots can help navigate some of that, hey, I'm here. Oh, OK, we'll bring it out-- that sort of thing. A lot of stuff around retail or coordinating volunteer efforts. And if you're involved in any of these things, I really encourage you to check out her blog post. It's a really great place to start with that. And I'm very excited to see how we can kind of apply this to help magnify all of our efforts to deal with this as best we can.

MARK: Yeah, like you said, it's very topical right now, obviously, dealing with a lot of the challenges that are coming up. And Priyanka has been on the podcast a couple of times talking about Dialogflow and conversational AI. So I highly recommend. If you're interested in that kind of stuff, definitely check it out. Priyanka has a YouTube series, "Deconstructing Chatbots," where she went through a whole bunch of examples of Dialogflow, setting up chatbots, but then taking all that knowledge and then applying it to something very present and very real, but really looking at the challenges that people are facing. It's really great to see that.

BRIAN: On that topic of keeping it real, if you write one of these dialogues, how do people actually interact with it?

MARK: So Dialogflow is this platform where you can build out these things. And if you want to hear more about it, there's a couple of different episodes. I guess we'll put some in the show notes where we go into a little bit more detail. But when you build out a Dialogflow application, you do it through a series of intents. And then you have integrations, so that people can interact with it through a chatbot, which you might put on a website. You can do that same chatbot as an SMS or texting thing where you're able to text back and forth. And you can even integrate it with voice systems-- say, Google Assistant, for instance. So you have all these different ways to interact with it, but the core idea is that Dialogflow is handling that entire conversation from the AI perspective.

BRIAN: And then you can call out to back-end systems to know the status of things or for a particular user, that sort of thing.

MARK: Exactly. We'll go ahead and put some links to the episodes. I think there's two in specific that I'm thinking of. I don't remember the numbers, but we'll put links to them, because they were great conversations that opened up Dialogflow a bit more. And they talk about how to intelligently design a conversation, because when you think about it, without getting too much into it, humans are really good at adapting. And machines have a little bit of trouble with it. So you kind of have to tell them and give them a bit more information. So yeah, we'll absolutely have those down below.

BRIAN: Anything that helps.

MARK: Yeah. And with that in mind, I'd really love to see people get inspired by looking at these use cases and saying, oh, this is something that applies to me through my business, through my hobbies, through my profession, whatever it is. And how can these tools help me solve some problems? Super interesting to see. Well, with all that being said, why don't we go ahead and dive into our main guests coming from Capgemini talking a little bit about their rugby problem and how they solved it using ML?

[MUSIC PLAYING]

Awesome. Well, thank you all for joining us. First off, can you tell us who you are and what you do?

AISHWARYA: So I'm Aishwarya. And I'm a data scientist at Capgemini. And mainly, my work revolves around testing hypotheses and trying out machine-learning algorithms with emerging tech.

PRISCILLA: My name is Priscilla. And I lead the Applied Innovation Exchange in the UK. And, I guess, what we do is we try to solve the most complex problems across all industries supplying emerging technology like AI, blockchain, machine learning. And we try to leverage our partners like Google and others, a wider ecosystem of startups, to make sure that we can do that as quickly and as at-scale as possible.

MARK: So I want to hear more about the Applied Innovation Exchange, but can you give us just a background on what Capgemini does?

PRISCILLA: Sure thing. No problem, Mark. So Capgemini is a global leader in consulting. We focus on digital transformation. And that's to say, again, how we apply technology, whether that's cloud, digital, or platforms, that allow us to realize our client's business ambitions. And a lot of them today, as you can recognize, is around growth and sustaining their business. And so we're there to make sure that we can support them.

I think what people don't know and was a shock to me is that we actually are a group of 270,000 people. So we're not a small business. And we span around 50 countries. And yeah, most recently, we have done an acquisition with Altran. And there's a bunch more companies that we acquired along the way, but Altran is something we're particularly proud of, because it's a $17 billion business that we can have as a result of the acquisition.

BRIAN: Tell it to me straight, because innovation is kind of a squishy term. What do you actually do in Applied Innovation Exchange?

PRISCILLA: I love that you're asking that, because sometimes it feels too high-level. So let's get down to, first of all, who the people are. It's a bit of a group of people that are extremely passionate about the same thing, getting stuff done-- although, they might use different words for that-- but getting stuff done as quickly as possible. And they don't want to reinvent the wheel.

We're a group of people that are not about doing something that's already been done before, so we want to make sure that we leverage everything out there that can solve a problem or direct people to the right place. But if it's something that hasn't been done before, we're that group that can make it happen. And yeah, that's the common passion. And we'd like to do it with technology, obviously.

MARK: So it's kind of like the group that gets to experiment with a lot of different, like you mentioned, cool technologies, things that are coming up, but then also trying to find a way to tie that back to what your customers ultimately want to solve.

PRISCILLA: That's exactly right, Mark. We've grown in the last three years, and that we might have been more tech-focused, to be honest, in the first year. And then we learned a valuable lesson of what's the point of building tech for tech's sake. So there's a need to make sure that whatever we build is really something that users are going to use, clients are going to deliver value for. So that combination of applied is really important to us. Otherwise, what's the point of having something new and no one's using it?

MARK: Exactly. I feel like that resonates with a lot of people, because it's always fun to try out these new technologies, but when you're a business, and especially one this large, you have to have something to do with it, because you need to move the business forward. So with that being said, we actually wanted to talk about one of the cool projects that you had mentioned to us is the Rugby Sevens application. So first off, I don't know anything about rugby. So can we get this brief quick on what the Rugby Sevens is?

PRISCILLA: So, I think, first with the context. And I'll be honest with you guys, I'm Canadian. So I'm not massively into rugby myself. I'm more into hockey. So I've learned a few things along the way. This started because Capgemini became a global innovation partner for the HSBC World Rugby Sevens Series.

So, as a result, they said, hey, why don't we throw you some challenges? You guys say you're great at innovation. What can you do to really impress us around the fan experience? What can you do to enhance that against what it is today? So our plan was to build some products and services that could really kind of provoke what they could do in the space, so we could really challenge their thinking.

And we've done stuff from a match predictor game to a virtual reality experience, but this one, this particular project that I think, Mark, you're alluding to, is around how do we apply AI to really automate some of the things that we can get normally from a video feed and a video stream. What can we do with AI that can help us calculate and know the events in a game? So I don't know that I have to explain what rugby is for you, but I think when we get into the details of the challenge, I think you'll see that we learned a lot about the type of plays and the type of events that happen in rugby that are so important.

AISHWARYA: I would say that also the interesting thing about this project is we worked with broadcast footage of a footage which already is shown to audience. So how do we take that footage that we have in order to actually gain insights from it? So something which already exists now, but how do we use AI, computer vision, to actually identify what's a tackle, what is a pass, what is a scrum-- so to actually use the technology that we have and to put it into perspective of the sport.

And in comparison to other sports such as English football or tennis, I would say rugby is a very heavy game in terms of physical contact. So in rugby, you'd have so much physical contact in the game. There's usually loads of people together and loads of interaction between the different teams, so that when we try and build an algorithm which tries to identify what's a scrum, it's actually very, very difficult, compared to sports such as football where you usually have distances between the two players.

MARK: Wow. So how did you approach that problem? And what were your inputs? It was just raw broadcast footage?

AISHWARYA: Yeah, so we used raw broadcast footage, which was already available. And these are around 15 to 20 minutes matches. So we used that data to process them into loads of different frames of pictures and then used algorithms on top. And mainly leveraging the platform which already exists to kind of bring out insights, such as when is a pass happening, when is a tackle happening.

And the interesting part is, because the broadcast footage was a live footage, it had loads of moving camera angles. So you'd have a camera angle which takes it from the side, one view from the entire pitch, one from another angle. So I think the real challenge when we were working with this was how to leverage the different angles that we have in a camera to even justify what the event is showing half the time.

MARK: So let's dive a little bit deeper into that, because you obviously have this footage, and it's coming from different angles. It's all video. And you need to determine what an action is-- like you said, if it's a tackle. I still don't actually know what a scrum is. But how do you actually figure out what's going on using what I assume is going to be some vision-based AI?

AISHWARYA: Essentially, what we did, we manually tagged 30,000 images with the help of manual taggers. So we had seven volunteers and within the team who were always helping us to tag images. And what we did was we essentially passed through those tagged images through AWS Kinesis. So AWS Kinesis is essentially a video-streaming technology that we use. And to process the video footage, we'd use AWS Fargate. So essentially, these images were able to identify bounding boxes around when an event was actually happening. So you essentially put a square around a box, which shows that this is a tackle. This is a scrum. And this was able to process and identify in near real time when the certain event was happening.

MARK: So once you get these events, how does that affect the fan experience? Because we're kind of starting there-- how does this circle back around to-- how does things change for people?

AISHWARYA: So I would say the main aim of the project was essentially for commentators and fans. And so for commentators, essentially, currently they get loads of manual documents, which they go through. And they have spotters around the pitch who count how many times a scrum or a tackle has been made. So then, they tally up manually on how that triggers still is the end of the match.

Whereas, I think the objective of the project was essentially how to optimize that in a way that we can get contextualized insights in near real time. And this is mainly to help commentators and also for fan engagement. So imagine having an application where you can constantly just check, oh, OK, he made a score. We got three tries in the game. So kind of having that interaction with the game and improving that interaction was essentially the goal.

PRISCILLA: I think what's also interesting about elevating the fan experience is that when you have this data that's coming in real time, there's these opportunities to compare players, how they're performing, what are their areas of strength. You could actually simulate what this team would do, how well they would do against another, as a match prediction. And again, I mentioned that I'm not American, so I don't really know football that well, but I assume there's something called fantasy football like fantasy hockey. And you could use this to kind of have fun and engage with your fellow fans in different ways. And that wouldn't be possible without that kind of real time data that users can engage with. So I think that's the exciting thing, lots of possibilities as a result.

MARK: Yeah, for fans of sports, obviously, they tend to get really into the sport. And as a result, the more stats and the more information you have, the more they can start doing with that, whether that is something from just understanding the game and who their favorite players are, all the way to doing these full simulations of who would win in this impossible hypothetical. So with that being said, it kind of makes sense that you would develop this application for rugby players who want to do the same thing. They want to engage with the sport.

I'm curious to hear a little bit more about the tagging process, because, Aishwarya, you had mentioned this is an incredibly manual process. So what does that look like? How do you actually get people to label it? And how important-- I'm assuming very, but how important does the whole labeling process play into the grand scheme of things?

AISHWARYA: So yeah, it was definitely a very manual task that we went through. But essentially, how it looks like is if you have, for example, a screenshot of a moment within your video footage, what we essentially did was we tagged-- tagged, as in, I'm talking about making a box around what we see in that frame-- so that means, for example, if two people are in contact, and it's a tackle, we tag that bounding box and label it. Essentially, we label it as, it's a tackle. And the same thing, what we also did, which was quite interesting, that was not only did we tag the main event which was happening in the frame, but we also tagged other things which were really interesting-- in that pitch, for example, the goalpost-- the logos at the back of the frame.

So the reason why we've done that is because, as I was saying earlier that we were working with a broadcast footage, we wanted to identify what pitch direction we were looking at. So for example, if the entire team is towards the goalpost, they're essentially towards a try area where they're actually scoring. Or if they're towards the middle of the pitch, then we know that they're still in the middle of the pitch. So we kind of get a gist of what direction the play was moving at, was also something we were looking into. And, in terms of sponsorship logos as well, we were tagging the logos at the back to kind of see how the sponsorship interacts with the broadcast footage, how long a sponsorship logo is shown. So something for their analytics that we did whilst we were tagging, we also tagged those as well.

MARK: That's really important. This seems like a good recommendation for other people thinking about going into one of these projects. It's super manual, but while you're there, tag maybe more than you think your original goal is.

AISHWARYA: Yeah, I think another, actually, that just reminded me of something else that we did-- so in regards to the audio footage, what we did was we identified key areas where a whistle is blown or, for example, when the audience was cheering. So then we actually know that something in that time has happened in the game. So if in the audience everyone's cheering, it's probably someone has scored a try. So sort of bringing the voice recognition into play as well in order to identify key moments in the match was also something that we were looking at.

MARK: So there's this overall theme here of use all of the things available in your base material.

PRISCILLA: Just to kind of clarify, try is the touchdown, right? I think one of the lessons we learned is in the process of discovery, you have to really be open-ended around what you might be able to find, because we didn't think we would have a working prototype that not only automated events, but essentially gave us a sense of sponsorship impression on video. How much is a partial logo versus a full logo? And that's something that hasn't necessarily been surfaced in terms of video marketing before. So again, wouldn't have been possible if we hadn't opened up that area.

I think one more thing we did, Aishwarya, was I think we even looked at the referees, could we identify a referee. And we had to actually learn about what kind of hand gestures would they use for particular whistle blows and whistle events. And that would allow us to also kind of create the logic around what is likely going to be the event that's being processed at that time.

MARK: That's awesome. So there's all this unstructured stuff just there waiting to be inputs. So this a huge arc. And it sounds like it was successful. If you were to do this kind of project again, what would you do differently? Or are there different technologies you'd consider? What have you learned? And what would you want to try on a next iteration?

AISHWARYA: For this project, essentially, we were working on a time-bound piece where we were constantly learning how to solve the problem, but using the technology experience at hand. So we went with AWS in order to do that, because A1 was, like our team at the time, more and more flexible with the technologies. AWS had an in-built video streaming process that we were able to use. I think given an opportunity, like Google platforms, there are other technologies that we could have explored that we didn't explore too much into. I think, definitely given the time, we would have spent more time actually trying out different technologies and mapping those out into figuring what would have been an ideal technology to use. So maybe given another opportunity, definitely would have explored that.

MARK: So, of course, different platforms are going to have different strengths and weaknesses depending on what you're doing. And it sounds like, usually the best way to figure that out is to try it on all these different platforms and identify where-- like, hey, this one worked a little bit better or this one was faster and whatnot. For people who are getting started, it seems like there's a ton of initial investment required in order to do this kind of thing with labeling. Do you have advice for people who are interested in getting started, but maybe don't have the resources to throw at all that manual effort?

AISHWARYA: The reason why we tagged so many images was essentially because we were working with live stream footage, but I think in order to get started, you don't have to tag as many pictures as we did. I think we just did that because there was-- especially with rugby, there were a lot of nitty-gritty detail in those rugby matches that we ended up tagging so much, because our results weren't so effective when they started predicting. Because, for example, a tackle looked really similar to another movement in rugby. So I think that's why we were way too manual in order to tag all of these.

But it depends which use case you're looking at. So if it's more to do with video recognition in general-- and probably other sports apart from rugby as well, other sports as well-- I think it would be a lot easier. And you wouldn't have to go through this manual process. And I think there is also a few tools out there as well, which actually help to tag. But I think because our scenario was very specific to what we were looking for, we ended up manually tagging them.

PRISCILLA: Yeah, and I think it depends on the use cases, as Aishwarya said. We've had so many different problem statements around, can you extract unstructured data from a variety of contracts or commercial documents? Which you know the variation is really honestly endless. You can have a table there. And that table might reference the diagram. The diagram might have nothing to do with this image.

So then you have to think about, well, what kind of use case is it? And likely, there might be a situation where it's a blended model. And I think that's something we're also looking into, is how do, then, we create scenarios that we can solve where it's a hybrid situation? What was that blend? And also, what is the trade-off between cost based on the processing speeds and times to quality? Because sometimes you don't need that degree of accuracy. So again, linking it back to what is the purpose and what is the use case. Let's not over-engineer that use case-- might just have unnecessary effort that won't be needed.

MARK: There's another takeaway I'm hearing here-- really spend some time upfront on that design. Stay open-minded, like you said earlier, but then focus it down to what you're actually trying to do when you start to execute. So that's some good lessons for all of us, I think. So is there anything we missed or you'd like to mention before we kind of wrap things up?

PRISCILLA: Yeah, I think, for me, we talked a little bit about our group and the Applied Innovation Exchange. And I would be doing them a disservice if I didn't mention, try to work with the best partners including Google. And there's just been a ton of other projects that we've worked on from building a brick recognition app, which, by the way, used TensorFlow, and Firebase, and Google Cloud. And we also did a home assurance project with IoT devices in the home, smart devices in the home, to see if we can disrupt the insurance industry and lower everybody's premiums-- as opposed to raise them when things go wrong, but reward you when things go right.

So I think, I guess, what I'd leave with you guys is this feeling-- any problem, however different, no matter what sector it's from, there is really some common themes and common learnings. And I think we live in a very exciting time where we can start to apply those learnings across each area. So hopefully, we get to do that with you guys more as well. And anyone on that call, think about Capgemini. But that's something we're very proud of. I think we live in a really exciting time.

MARK: I do have one more question. So what is a scrum?

PRISCILLA: I knew it. Mark, I was actually going to turn it back to you and be like, do you know what a scrum is, Mark?

MARK: I don't! I still, after all this talking-- and I don't know anything about rugby, obviously. What is a scrum in rugby?

AISHWARYA: I was going to say, it's been a long time. So essentially, I think it's when two teams are up against each other in a huddle. And you pass the ball to the team's side in order to get a kick-start off the game.

MARK: OK, so that makes sense.

PRISCILLA: The other way to put it-- so you know the cat analogy of detecting a cat is really hard, because of the spine, right?

MARK: No, what is that?

PRISCILLA: In image recognition, one of the first challenges, I think, in vision-- and correct me if I'm wrong, Aishwarya, is identifying a cat is just extremely hard, because of just how the curvature of the spine is. It's hard to pinpoint specific imagery that is definitely cat, because they can curl up. They can be extended. And to me, when you're talking about guys interlocked, heads down, shoulder to shoulder, in a bit of a circle, and what you see is their backs and behinds, it's really hard to tell what is that. That shape changes so often, and it's really hard.

So I think that's what we learned, right, Aishwarya? That is probably one of the hardest things to bound as an identifier. Whereas, in tennis, you've got one player. And then in football, you might have one, two people colliding. But you have a massive group of people interlocked in some weird shape, and it's tough to do.

MARK: Well, I'd like to thank you all for coming and teaching us a little bit about rugby but a lot about machine learning and what it looks like. So Priscilla and Aishwarya, thank you so much. I believe I know what a scrum is now, so that's good. And I definitely have some people on my team who are offering to give me a quick lesson on rugby, so I might take them up on that to understand it. But it really was great to hear about how they thought about building a system, what goes into kind of training this, and hopefully we can get some people inspired to look at how they might do something similar for whatever use case they have thrown their way.

And, of course, with this rugby ML application they built, I was curious about how you might go about building that from the Google Cloud perspective. And so when I thought, ML and rugby, of course I thought of our own developer advocate Zack Akil who knows a lot about both of those subjects. So we turned to him for our Question of the Week and said, how might you build something similar?

[MUSIC PLAYING]

BRIAN: So Zack, if you wanted to build an application where the input, you basically got a bunch of footage off of TV of sports, and you want to try to make sense of that and use it, make the experience for fans better. How would you approach that problem?

ZACK: OK, so that's a good question, because there's two major video tools that we have. And one of them is a pre-trained API. And the other one is what we call AutoML, which allows you to build a custom model. And because this is sport and based on how granular of things you want to detect, if they're generic things like a player running, that one might already be solvable with just the video intelligence API. That's pre-trained. You don't need to feed it any data. You just give it your video, and it will label it for you.

But if you're looking to do something more sports-specific, something like-- for example, I play rugby. So things that happen in rugby are tackles. You've got scores, which we call tries. And then you've got these really weird things that no generic API would be ready for like the scrum or the rucks. For that, you can collect some video footage that you already have recorded. You can feed it into the AutoML video intelligence UI tool. And you can actually go through and label it a bit like an interface you might get with a video editor where you outline segments of the video.

You do that, but you say, OK, this is where a ruck happened. This is where a tackle happened. This is where a try happened. And then you click Train, and it will train its own custom video intelligence model to automatically annotate future video clips. So those might be two different ways based on how specific you would like your video intelligence model to be.

BRIAN: OK. So a part of that, there's a huge amount of manual labor. So we'll just assume we somehow figure out how to tag all these things. So if I give you that video, what comes back from the API? What does that actually look like in concrete terms?

ZACK: When I last used it, I remember it returning the segment timestamps. So it will actually tell you the start number of seconds and the end number of seconds of each annotation. And the annotation would be whatever you told it to annotate, whether it be a tackle-- so it'd be the string tackle starts at this time and ends at this time. And that is for the video intelligence classification. So that will just tell you between this time and this time in the video, this thing happened.

But we also have a version of video intelligence that does object tracking. Now, this is a really cool version, because I've actually used it myself to try to build my own rugby video tracking software. Except what I wanted to do is I wanted to use drone footage that I collected of a rugby training session and build my own rugby player tracking model. So it would actually draw a bounding box around each player. And then it will tell you, this player ran to here, and then here, and then here. It would show the whole path of where they ran.

And I was actually planning to use that to map out our moves, because we would run lines and be like, OK, this person is going to run an inside line here. This person is going to run a C shape around the other player. And I was going to use the AutoML video tracking tool to build a custom player tracker that could then, in future when we're playing matches, automatically detect when we've ran a certain play or certain move. Because unlike, say, American football where each segment of play is a single play, in rugby, it's continuous. So we can't easily stop the footage and say, oh, a play happened between here and here. In rugby, it has to continuously predict what's going on. So using that tool, the object tracking video intelligence would be useful for tracking individual player movements.

BRIAN: OK, that sounds super fun and useful. If somebody wants to experiment with that, how would they get started?

ZACK: What's really cool about all the custom versions of our tools like AutoML where you have to either label your own data or we also have a labeling service where you just upload some instructions-- and we actually have human labelers that will label your data for you. What's cool with those custom tools is that we have the equivalent pre-trained versions of those. So for the example of the rugby player tracker, we already have the person detection built into the video intelligence model.

So I actually used that pre-trained API on my drone footage, and it was good, but it wasn't perfect, because I guess the generic model isn't used to people being far away, from a bird's eye view. So it was detecting maybe 80% of the players' movement. So I was actually able to build an MVP of that with just the pre-trained API. So if you're interested in doing any kind of custom video, I would recommend just throwing some video clips into the video intelligence API and seeing what comes out, because oftentimes, it already has some really cool basic tracking and annotation that could be useful.

BRIAN: Thank you very much. I'm excited to go play with that.

MARK: Yeah, thank you, Zack, for telling us about that. I think, once again, we obviously have a lot of machine learning topics here, but I think it's really helpful to get a different perspective on what tools might people start with. And again, what can people play with? Where can they go to start learning these things? Because a lot of people have a lot of invested knowledge in machine learning. And it can be a really high barrier to entry. So for people who have those business challenges but don't really have the ML background, I really, really want to see them get inspired, and start to use out these tools, and figure out, hey, what is a low-code-- maybe even no-code-- way to get to solving my problem?

BRIAN: And you do a few experiments. And you get more ideas from that. And you get to share it with others. They start seeing value. And it kind of builds on itself. I do think it was amazing, we didn't actually let him know that there was anything rugby-related going on in our podcast, but he just went there right away. It was just magic.

MARK: I knew Zack would be the perfect person to talk to for this.

BRIAN: So what have you been working on lately? Has any of your stuff come out and been visible?

MARK: Yeah, we just had the Kubeflow one on one series. Stephanie and I were working on that. So if you're interested in machine learning, then Kubeflow is another interesting platform for setting that up. We've talked about it, I think, a couple of times in the past few weeks. But go check it out. It's just a really cool introduction to what Kubeflow is. And we're excited for people to get more content there, because I think people have been asking for more Kubeflow. How do we understand it? How do we get started? What about yourself? What have you been working on?

BRIAN: Moro Stephanie awesomeness. Actually, Stephanie and I did another Eyes on the Enterprise video. And that just came out around rethinking the VMs. The basic idea is that we all know what a VM is, but we think of it as kind of a slice of a computer. And our VMs are these same abstractions, but they're built out of a slice out of a whole data center. So you're networking is the data center networking. Your disks are spread out across thousands of machines, all this stuff, and it really helps you be faster, more agile. It helps you save money, respond to things quicker, stay up better. So check that out.

And then I just wanted to highlight, Yufeng is doing a series of live interaction discovery things all around Google Cloud. He's just been partnering with all kinds of folks. And I did one with him a few weeks ago, but he's already got, I think, five, six, seven up now. And we'll put links to the show notes in there. But basically, a much more interactive, he's learning something from some expert, style live stream sort of thing. And those are all up on YouTube, so check it out.

MARK: Yeah, and Yufeng's also done a number of AI Adventures videos on the Google Cloud Platform YouTube channel where he talks about a wide variety of AI and ML, different kind of definitions, how to do certain things. It's a great series. So once again, more ML content for those who have been asking for it. All right, well, Brian, always fun to work with you. And super excited to talk about all these cool ML-related projects. But next week I'm sure we'll have more great content for our listeners. So thanks for listening, and we'll see you all next week.

BRIAN: Bye bye, all.

[MUSIC PLAYING]

Hosts

Brian Dorsey and Mark Mirchandani

Continue the conversation

Leave us a comment on Reddit