AI's Impact on Media Asset Management for Studios, Sports Teams, Churches and Content Creators
Social media has changed the way video assets must be tagged and organized to create short content easily, enhancing storytelling and brand recognition. Are you curious about how artificial intelligence is revolutionizing media asset management? In this episode of "AI Experience," I sit down with Sam Bogoch, co-founder and CEO of Axle AI, to explore AI's transformative role across diverse industries. Sam shares insights on how AI is changing the way movie studios, sports teams, churches, and content creators handle their vast libraries of video content. Discover how AI-driven solutions are making it easier to store, categorize, and search for digital assets, ultimately improving efficiency and creativity.
Sam Bogoch is cofounder and CEO of Axle AI, Inc., a leader in the application of AI to video search. The company launched in 2018, and since then it has grown to over 1,000 sites deployed worldwide. Prior to starting Axle AI, Sam was Director of Product Management at Avid for Enterprise and Workgroup Products at Avid (Media Central and Interplay), where he and his team grew Avid's annual revenues in the space from $17m per year to $55m per year. Prior to working at Avid, Sam was a recognized leader in the DAM industry. His educational background includes degrees in Physics from Harvard and Medicine from Columbia University and being awarded the Sandoz Prize for Research; early interests included developing parallel processing systems for neural network simulation.
Sam Bogoch
CEO
Julien Redelsperger : « And I'm super happy to welcome Sam Bogoch. He is the co-founder and CEO of Axle AI, a leader in the application of AI to video research. So today, we're going to talk about AI and media asset management. You may think it is boring, but it's actually super interesting, and it could certainly inspire you to improve your digital asset management system for categorizing and filtering your videos and other assets. Thank you for joining me today. How are you, Sam? »
Sam Bogoch : « Doing great. Psyched to be here. Thanks. »
Julien Redelsperger : « Thank you for joining me. Let's start at the beginning, Sam. Why is media asset management such a big topic? And more importantly, what it is for those hearing the term for the first time. Could you explain that to us? »
Sam Bogoch : « So media asset management, or the acronym is MAM, is becoming a big topic. It used to be very niche. It used to be specific to, like, broadcasters and movie studios, because they were the only ones who had enough media to make it really necessary. But what's happened in the last few years, particularly with the explosion in social media, is that lots of people have tons of video. And we're increasingly seeing customers with hundreds of terabytes and even petabytes. A petabyte is a thousand terabytes. And that is not so unusual anymore. It used to be, like, very exotic and, you know, it's like, wow, they have three petabytes. But now it's, you know, municipalities, churches, sports teams, they're all having this amount of data. And it's growing very rapidly because more and more of the material is shot in 4K or 6K. And of course, that just triples or quadruples the amount of storage needed for each clip. Plus, there's just so much more being shot because the cameras are so much more affordable. And everybody is carrying a 4K camera in their pocket. So media asset management is just the ability to store, categorize, organize videos and digital assets. Is that correct? That's right. The one proviso I would say is the storing part, you know, you don't need media asset management for. You can just store it on hard drives. You can store it in the cloud. And typically what happens with these customers that have significant amounts is that they'll store it on a network-attached device, a NAS, network-attached storage. So they can do that without NAMM. And in general, they do. So one of the things about our software is it's designed to kind of slide over the existing storage that you have rather than dictate the storage or say, you know, you need NAMM to do this because you really don't. It's a question of not just storing it, but being able to find it later, reuse it easily, particularly given the rapid turnarounds that are typical in the social media world. You know, you don't have weeks to find something. »
Julien Redelsperger : « Yeah, sure. And so that's where Axle AI comes into play. So what is exactly that you do and who are your clients? You mentioned churches, sports teams. Could you just give us some more details? »
Sam Bogoch : « Yeah, absolutely. So what we do is we scan what's on your storage. We preserve things like the folder structure and the naming so we don't mess with the files. And in fact, we think that's really important. A lot of the earlier media asset management systems are kind of forcing you into certain patterns of workflow. We're going to put files over here. We're going to name them like this. And in general, people already have their own preferred ways of doing that. You know, if it's a TV show, they'll shoot it by years and then episodes within that. If it's a sports team, they'll shoot it by seasons and then games within the season and so forth. So the organization usually comes with the territory. But what we do is we scan all that material. We make low res proxies, previews of all the files so that it can be viewed in our browser interface. And that way you don't have to be sitting at the storage to search it. You can search it from home. You can search it from a remote location. And you can share it with your colleagues and clients and so forth. And then once we have those proxies, obviously we make them viewable and you can tag them yourself. But the big breakthrough is that AI can now tag them to a very large degree. So transcripting is automatic, face recognition, object recognition, logo recognition, and scene understanding, which is like describing what's going on in the scene. So, you know, two people sitting at a table or, I don't know, a beach at sunset. Those kinds of general descriptions of what's happening are now possible and very descriptive with AI. So we provide all that. »
Julien Redelsperger : « So does it mean the AI is able to watch the videos, understand what it is, and categorize that video saying, like, this is a specific scene that happens in a specific context, etc.? »
Sam Bogoch : « Well, the context is still a bit of a challenge for AI. It's getting there. I would say over the next couple of years, that, like so many other things, is going to be a solved problem. Today, what AI will tell you is what's happening right now in the scene. In terms of figuring out a storyline throughout the overall movie or TV show or what have you, it's still a work in progress. But meanwhile, you've got the transcript, right? So you can very quickly jump to, it's like, I need to find the place where he says, great to see you. Okay, let's find that. I need to find the place where, you know, there's an argument going on, or if it's sports footage, I need to get to where the goal happened. These things are doable today. And you can't imagine how much time people used to have to spend scrubbing through the material. They still do. 99% of our industry is still doing this the old fashioned way, which is to say they have really no descriptive metadata to get their work done. »
Julien Redelsperger : « Yeah, that was my next question, actually. Like before the AI era, how did organizations with large video media libraries manage the footage? Like, do they have people actually watching the videos and scripting and making notes? »
Sam Bogoch : « Exactly. Yeah. Tagging. That was the dream. The reality when, I mean, so I, when I was at Avid, which is a company that handles the biggest media companies, I was the product manager for their man. And I would go visit these companies and they would all talk very optimistically about all the tagging they were going to do. But when you came back a year or two later, they would admit that they really hadn't had time to get to that because the editors are busy editing. The camera people are out shooting. The producers are busy producing, you know, no one, no one is taking out their notebook and logging. Now there are exceptions to that. Sports has very detailed logging. Interestingly, one of the places that I visited that always had the best logging was the, the awards shows like the Oscars. They have the red carpet ceremony. They had a team of people, the company that covered that had a team of people who were literally logging every aspect of the interactions on the red carpet. So it's like George Clooney steps forward, George Clooney looks left, George Clooney looks right. And they were just typing, typing, typing, you know, so like, and you know, George Clooney is wearing Dolce and Gabbana shoes. Whatever, whatever they could come up with was going into the data, but that took a squad of people. And it was only really sustainable for, you know, one evening a year with the Oscars, maybe that the, certainly the Grammys and the Tonys or whatever, but you know, you couldn't afford to staff that level of intense data ingest 24/7. So one more example, a major global news organization, which you would expect. I mean, this is like one of the big news aggregators and field news organizations. And how did they do their archiving? And they basically just group them by day. And I said, you don't have anything that says, oh, this is Barack Obama giving, no, we just group it all by day. We just group it in there. And then if it's important, we go back and we look up what day it was. So this is like, you would expect that they would completely have this covered and they don't. And the same goes for major movie studios, major TV networks. It's been kind of an elusive dream because nobody has the time and, you know, people are expensive, their time is valuable. And so this is a job that literally never got done until AI. »
Julien Redelsperger : « So why is it important to tag videos to explain what's going on on the screen? What's the value behind it? »
Sam Bogoch : « So that's a very good question. Historically, there wasn't that much value. In fact, the reason that news organizations only was only grouping these by days was that the chances that they'd have to go back and find that footage were slim. Maybe it's for an obituary piece. Somebody just died. They need to find an event that happened. It was, you know, but didn't come up that often. Social media has changed everything because it has changed the nature of dialogue to where it is interactive and snippet based. And suddenly you have to find the relevant soundbite and push it out there. A great example is one of the presidential campaigns from the last election cycle here in the US was Elizabeth Warren's. We were actually our software was being used by several of the presidential campaigns last time around. But Elizabeth Warren in the primaries, her team was capturing all this audio. They were using our software to do the automatic transcription. And she would hold these town hall events that would go on for two, sometimes three hours in all parts of the country. They did over 100 of them. So it was kind of an ask me anything. You could just ask Elizabeth Warren any topic and then she would proceed to answer it. Well, you go on social media and people are constantly trolling and complaining and saying, oh, I bet Elizabeth Warren doesn't even have a policy about, I don't know, Sudan. Or I bet she doesn't have a clue about Medicare reimbursements or whatever. This allowed them to very quickly search from these Q&A sessions, find the right soundbite that was exactly responsive to the question and post it back. You know, usually within an hour or so of the question being asked. So instead of having to say, oh, I'm sure we have a position on that or, you know, blah, blah, blah. They could say, no, here is actually what Elizabeth Warren has to say about Sudan, you know, and it was extremely powerful. And I think that's just the political angle. But you can think about the same thing with sports teams, not only for game footage, but interviews and fan footage. All these things that you can splice together for promotional purposes and for social media just to be responsive to what people are talking about. »
Julien Redelsperger : « I'm curious about churches. You mentioned that just a few minutes ago. What's the case here? »
Sam Bogoch : « Well, in the US, religion is an industry. You could argue that it is in other places, too. But here it's explicitly an industry. And so churches can get very large. But during COVID, they were effectively shut down. Right. So some of these churches have thousands of seats. They have media, you know, huge, large screen displays. Everything was shut down. So that meant everybody was participating by streaming. And so that forced them to really focus on production values and the video content. And that became a primary way of fundraising and of connecting with their audience. So now fast forward to today, of course, they are coming back in person, but they also have aging demographics. Many of these people find it difficult to come in person, so they're still watching it streaming. So you have a video culture around worship. And then, like any video culture, they want to be able to do highlights. They want to be able to, you know, "Here's our pastor with something to say about premarital sex," or whatever, you know, whatever the topic is of the moment. And they need these sound bites. And so, again, it just comes back to the fact that years ago, they might have televised these sermons, but they would not have been slicing it up in this very intentional way and micro-targeting it on social media. And today they are. So it cuts across government, sports, churches, corporations. We have customers like Kellogg's and Coca-Cola that use this for their marketing campaigns. Again, it's all about the micro-targeting, the ability to grab content. And then there's overlaps. For instance, we do logo detection in our software. And for sports teams, it's critical to be able to go to their sponsors, because, you know, inside the stadium, you'll have signs and so forth. But the sponsor might say, you know, "Let's say it's Coca-Cola. I really want to see great action happening in front of the Coca-Cola logo." Right? So then they have to go back. And so with our software, you could find all the segments where interesting stuff happened in front of a Coca-Cola logo. And then you can snip those out and turn them into a sequence or a promo. Or you can just send them to the client and say, you know, "I don't know what you have in mind, but here's 10 examples of cool stuff happening in front of the logo." »
Julien Redelsperger : « And does your system also do speech recognition? Do you do like transcription? »
Sam Bogoch : « Yes, the transcription is built in. And it's very high quality. You know, that's another thing. Two, three years ago, the best you could get for transcription was 85 or 90%. Now it's upwards of 95% for the major languages. And even for obscure languages, it's getting, you know, above 90%, let's say. It's all a question of having a big enough data set and training it. But the technology is definitely there. »
Julien Redelsperger : « So talking about technology, this is a show, of course, about artificial intelligence. What is the technology behind it? Do you use, I don't know, open source model? Do you have your own tech? Like, could you explain a little bit how it works? »
Sam Bogoch : « Yeah, we do a mix, but it's heavily driven by what's out in the open source community. I think it's insane to think that you're going to come up with some better algorithm than what tens of thousands of brilliant engineers have been slaving over the last few years. So we use, you know, we use PyTorch, we use OpenCV, we use all these frameworks. And then it becomes as much about curation of the technology as about inventing something new, right? You know, you're going to need some kind of object categorizer. You know, you're going to need something for time domain analysis. But these are things that people have worked on extensively, and they're primarily open source. There was that moment last year where open AI suddenly, you know, kind of rocked the world with chat GPT, and that was not open source. But in the year since then, there are dozens of LLM models that are very competitive with open AI that are open source. And not only open source in the sense that you can cut and paste them, but also that you can train them further, that you can use them as components of a larger schema. So for instance, when we do searches in our software, it's a combination of elastic search, traditional style search across text, and then vector-based semantic search across some of these models. And, you know, what's your search result? Your search result is kind of a mashup of those two modalities. But I have no doubt in the future that it's going to continue to expand. So I guess for us, it's really about delivering the best results to our customers and doing it with the best value also. Because if we were to gear up some, you know, 30-person engineering team to solve this problem, maybe we could get it a little bit better, but we would have to charge five or ten times as much to do so. And so today we have, you know, our whole company is 25 people. Of those, you know, we have machine learning people, we have computer scientists, we have UI and UX people. So it's a broad engineering team focused on really solving the problem. And AI is a key component, but it's not the whole problem. And so if I have a video media library and I want to use your tool, that means your software, Axle AI, will technically watch all the videos, all the footage, and will be able to detect, like, different objects or characters.
Julien Redelsperger : « Is it like a SaaS model or is it something you install on the server, on the machine? »
Sam Bogoch : « Very good questions. First of all, it roughly operates at real time on a per processing unit basis. So between making the proxies and then analyzing the proxies for all these different modalities, if you have an hour of footage, it's going to take, you know, a couple of CPU cores and some GPU, about an hour to process it. And of course, with today's multi-core servers and the bigger GPUs, you can get into multiples of that. So it's very reasonable to think you could get like 10x real time in the aggregate by having a big enough server and a big enough GPU. And our customers decide for themselves, you know, how much they want to spend on this, how quickly they want to get results. If it's a news organization and they want to process everything, you know, within minutes of it coming in, then they may want to be more aggressive. If they're just coming through an archive and they just need results over the next six months, then, you know, you just do the calculations and figure out how quickly it needs to go to catch up. And then once it's caught up, as new content comes in, we can process it at real time or faster. And how do you deal with data privacy? So here's the cool part. Everything we do runs on premise. So we have a cloud option for our software where people can save their stuff in the cloud and analyze it there. But even there, we're very rigorous about fencing off each client's data from each other client's data. But the ideal is you run this on premise because actually the cloud is quite an expensive place to do this because you're talking about hundreds of terabytes of data and parking that in the cloud is still very expensive. Whereas having it on premise on a network-storage device is not expensive. Basically just paying the electricity bill once you bought the device. So our customers do the math and the cutoff for us typically is in the neighborhood of tens of terabytes. Once you get, let's say you have 20, 30, 50 terabytes, anything above that, it starts to be cost effective to do it on premise. And you have the added advantage of privacy and the ability to customize your data model much more than you can with the cloud product. So anything above that size is worth doing on premise. However, there are people who want to try before they buy. They may just have a little bit of footage from a few events. They want to get a feel for what they're going to see with this kind of software. For that, we have Axle AI Cloud and that's just a pay-as-you-go model. It's basically $20 per terabyte per month. And that terabyte also comes with the user. So let's say you have five terabytes, you'd be paying $100 a month and you'd get five users. And that's good enough to cover what a lot of small teams are going to need. »
Julien Redelsperger : « Okay. And so you founded Axle AI back in 2018. At the time, we didn't speak much about generative AI. ChatGPT launched in November 2022. How much AI did you use at the time and what changed since the big wave of generative AI? »
Sam Bogoch : « Very good question. At that time, we were relying on third-party models that were not open source. So our first releases actually were with Microsoft Video Indexer, which was cloud-based and proprietary. Pretty good at the time. However expensive, they charged $10 per hour for processing stuff. And there were privacy issues. So, I mean, if you had a thousand hours of footage, it would cost you $10,000 to process it there. And by the way, those economics have not changed. So that's still their pricing. Amazon recognition has similar pricing. Google has similar pricing. And that's just to process it in the cloud, not to store it on a long-term basis, which is why the cloud is not a great place to do this stuff at scale. But that's how we started. And then we were doing the math with our clients. We had some very good proofs of concept. But they all came back and they said, "I can do this for 20 hours of footage, but not for my entire library." So then we set about developing on-premise models that we could manage and improve ourselves. We actually acquired a company in Latvia that focused on face recognition. And they've become kind of the core of our machine learning team. So that's worked out really well. And so since then, to the face recognition, we've added object and logo recognition, OCR for recognizing characters on screen, scene understanding. And we have modules also that do things like sports highlights. But essentially, all of it can be deployed on-premise. And for the larger media customers, that's a requirement, I would say. »
Julien Redelsperger : « Okay. What is the most difficult thing to recognize for an AI on a footage? »
Sam Bogoch : « Well, I think you pointed it out earlier, the broader context, the storyline, figuring out the meanings of things. AI doesn't really understand the meanings of things. »
Julien Redelsperger : « Like the emotions, if someone is happy or sad or... »
Sam Bogoch : « Superficial emotions are actually not that hard to do. You can pick up smiling and laughing and crying and those kinds of things. So that's not too bad. But deeper understanding is still elusive. And I think it's generally agreed within kind of the top scientific establishments for this that the large language models are great at simulating what a person would say, right? Or simulating what a person could create. But they are completely devoid of deeper understanding of the world. Even when they include physics, right? You can model the physics so that you can do really nice renderings. But that's not like Sora does. That's not the same as actually understanding what's going on. So you're still only a few steps up from kind of brute force machine learning. And I think the big challenge over the next five or ten years is to evolve that towards true AGI, true machine intelligence, where they can tell you what moved them about the show or what was really great about a particular interaction. I mean, we're still years away from that. How many years is open to debate. I just saw Yann Lecun was saying it's in the neighborhood of maybe five or ten years. You know, there are -- it could be that we'll approach aspects of AGI sooner than that. And so there are definitely kind of near-term signposts that we need to be concerned about. AI could still be quite dangerous before we achieve AGI. And all of this stuff has to be factored in. But to your question about media, that's what's missing is that deeper intelligence, as it is in every aspect of AI. »
Julien Redelsperger : « Sure. And so when you talk to your clients, when you talk to TV producers, Hollywood studios, churches, team sports, what do they think about AI? What's their mindset here? Is it a big transformation for them to achieve or are they super happy and super excited about AI in media management? »
Sam Bogoch : « I think AI overall is very daunting for the media industry. There are huge areas of concern, you know. I mean, all the way to Scarlett Johansson saying that open AI took her voice, right? And they probably did enough in that direction in terms of trying to copy it that it's probably legal material of some kind. I think there may be a lawsuit there. But, of course, all the big media organizations were already freaking out over this stuff, the New York Times and so forth. And then you just have to look to the strikes last year where whole sectors of the creative economy were really concerned about likenesses and about how likenesses could be repurposed. And that's even with today's technology. We're not talking about three or five years from now. Just today, there's ample room for abuse of this stuff. So, everybody is on the defensive and ready to be very concerned about AI taking their jobs and otherwise messing with them. However, this particular case of using it for search is one where there's never been enough manpower. There's never been enough hours in the day. It's never been viable to do despite everyone wanting to do it. So, it's actually an unmet need, a big unmet need that AI solves and there's really no downside that anyone's able to see. So, they've been very receptive. I mean, we were just at the NAB show in Las Vegas a month ago. And we had a 10 by 10 booth, three meters square. And we had over 400 visitors to the booth in four days. And it was just a deluge. And people were coming up saying, "I need this. I've been dreaming about this forever. Where were you last year? I need this now." So, it's a big unmet need, which is tremendous. And again, there's relatively little downside to it compared to many other potentially more troublesome uses of AI. So, obviously, broadly speaking, we're not exempt from the concerns. Everybody's worried about the meaning of AI. It's viewed as widely as disruptive. And there were, I would say, hundreds of booths talking about AI at the show. I mean, it was the number one topic of the year. And that's just this year. Come back next year, there'll be 50% more booths talking about AI. And so, basically, every booth has -- you know, if you're in this industry, in the media industry, you have to have a position on AI or you've been asleep for the last few years. »
Julien Redelsperger : « Do you think this is too much? Do you think AI might somewhat be like a bubble that will just explode because everyone is using AI as a marketing term these days? What's the meaning behind it? I mean, I know you are using true AI in your solution, but that may not be the case for other organizations or companies. »
Sam Bogoch : « Yeah, well, here's the thing. I think barriers to entry of using AI are dropping rapidly. Just literally this week, Microsoft announced AI PCs, which have significant neural processing capabilities built in on the motherboard, and they're putting all this stuff into Windows. Meanwhile, Google made their announcement last week. Apple is going to make their announcement in June. So, everybody is going to have AI at their fingertips. So, it's going to be just a no-brainer to say that you use AI in your work because we're all going to use AI in our work. Whether a company is an AI company, quote-unquote, is in the eye of the beholder. Like, to what extent is AI actually formative in what they do, or is it just a gimmick tacked on to some existing business? That is to be determined. But in terms of a question about the bubble, and we've seen waves like this the last few years, you know, Web 3.0. To this day, I'm still not exactly sure what that one was. You know, blockchain, right? These things come and go. It's like it's the buzzword of the year. And, like, if you're doing a startup, you have to use that buzzword or you won't get funded. AI is, in some ways, the latest of these bubbles or buzzwords. However, the big difference is that we're talking about a multi-year trajectory here. This is not a case where the impact of the gimmick, or whatever it is, is questionable and it could come and go. Like Web 3.0. It came and went before anyone even figured out exactly what it was. This is not going to be one of those. This is going to be, over the next decade or two or three, this is going to be -- this is like electricity. It's like the Internet. It's one of these fundamental changes that will infiltrate every aspect of your life. And, therefore, I don't want to say a bubble is impossible because you can always overshoot with the drama and the fear or the expectations. And we may be in such a moment right now. Particularly as it applies to AGI, which I don't think is here yet. Again, we still have a few years to figure out when it's coming, if it's coming. But is AI practical today? Is it doing useful stuff for lots of people? Absolutely. And so the only issue is have you overshot it or undershot it in terms of your expectations? But this is a fundamental shift in the global economy, and it's not going away. »
Julien Redelsperger : « Okay. I just had one last question about sports and the way sports teams are using your solution. Because are you able to filter and tag the action on the field, like, for example, a touchdown in football or a penalty in soccer? »
Sam Bogoch : « Absolutely. »
Julien Redelsperger : « And if so, how does the AI learn about the rules of a sport? »
Sam Bogoch : « Well, okay. Very good question and probably best understood at two different levels. The kind of AI that we're applying right now for sports highlights is just literally what it sees on the field. If a goal had to be called back because of an offside or something, it's probably not going to figure that out. It's going to say a goal, and then it's going to see the referee. It may not link those two events. Look, over the next six months, whatever I describe as the state of the art, assume that in six months this will be a solved problem, because that's how quickly things are changing. But at this moment, it doesn't have a fully causal understanding of the plot, if you will, of a game. But it is able to recognize on-field actions, and it is able to do things like track individual players based on their faces, based on the numbers on their jerseys. It's able to know what's a corner kick, what's a penalty kick, what's a goal. So a huge amount of useful information. And mind you, at the top levels of professional sports, those things are already logged and available in a data stream. But not everybody has access to that data stream, and it's usually paid, right? It's like you have to subscribe to a data feed service to have the game annotated that way. Whereas here, particularly for high school and college sports, those data feeds may not be available. »
Julien Redelsperger : « And what about less common sports? I don't know. I'm thinking it would be curling or sports that are not highlighted in the media. »
Sam Bogoch : « So I think in those cases, it's necessary to train a model for the most part. You can always do stupid things like just check for the audience noise levels, right? If there's a big cheer, something probably just happened. So for a really obscure sport, that might be the easiest thing to do. But it's also quite straightforward to train these models on actions, on specific shots and things that happen in those shots. So I think just like with the languages, the coverage in English is excellent. You're up around 95 percent and it even covers accents well. If you're trying to get to a less common language like Uzbek, you may not have those. Because Uzbek today may be where English was 10 years ago. But at the rate of development, hey, there's lots of smart people in Uzbekistan. They'll be working on this. And particularly with like Whisper AI and some of the open source models, I'm pretty sure there's going to be a very good high percentage Uzbek. So same goes for the sports, right? Badminton, curling, whatever. You're going to see some good highlights coming out of that and may already be available. It's a big world and people are solving these problems concurrently. »
Julien Redelsperger : « So you founded your company in 2018. We are now in 2024. What is the biggest challenge that you faced as an entrepreneur, as a CEO, as a founder of the time? »
Sam Bogoch : « So I would have there are two different answers. I mean, prior to a year ago, I would have said the biggest challenge was that we were kind of trying to pioneer this technology and that people were not necessarily ready for it. Today, because of the hype around AI, people are ready for it. It's been a huge change and the level of incoming business and our business is doubling every year. It's phenomenal. But then there's a different challenge, which is that there is so much money flowing into this space and there are these huge LLM hyperscalers, whether it's open AI or meta or Google. Does what we do become a commodity? In five years, could anybody do this? Because the models are so readily available and it's a solvable problem. Today, it's right on the edge of being a solved problem, which is probably perfect for a startup. So I don't know the answer to that. I do know that the focus of our team is industry expertise. So we're very, very good at applying the state of the art to the problems of our customers. And I have no doubt that those problems will continue to exist and need solutions. But we may, over time, find ourselves needing to develop less and less of the core technology and focus much more on things like the user experience and the suitability of our front end to the actual workflows and use cases. For instance, what editing software do they use? We already have a panel for Adobe Premiere, which is the leading editing application. But just in the last few years, another package, Resolve from Blackmagic, has become very popular. So now we're doing a drag and drop panel for Resolve. And if you go deep enough into these things and you solve the actual problems of your customers as they work every day, then just the existence of a big generic technology that could do the same thing is not a direct competitor. So we're trying to build our moat around being application specific and really making sure that we're solving our customers' problems completely. But that is our biggest challenge. We need to double down on that while continuing to harvest all this amazing open source activity for our R&D so that we don't need a development team of 20 people on the machine learning side. »
Julien Redelsperger : « And so how do you keep track with everything that's going on into the AI field? Because every day we have a new announcement, we have a new feature, we have a big new thing. It could be overwhelming sometimes. I mean, that's my feeling. What about you? »
Sam Bogoch : « I totally feel that way. I met with our team earlier this year, the developers, and I told them to expect that this year a week was roughly equivalent to a month previously. And then next year, maybe a day is equivalent to a week. It's just accelerating so fast. And I don't think we have seen, coming back to that question about a bubble, I don't think we have ever seen in the global economy these levels of investment. You just have to look at Nvidia's stock price, see how many GPUs they're selling. It's not that more video games are being played this year. All of that growth is coming out of the AI space, which means that all of those GPUs are being put to work, developing new models, running new inference, being used in new applications. And so this level of investment in a technology in a short period of time is pretty unprecedented. And I was around when the internet kicked off and was like, "Okay, it was a cool thing, but it took several years to percolate." And then, "Oh, gee, you can do shopping over the internet." I don't know how that happened. Maybe five years in or something. And all these things, or maps or whatever, they popped up. But the product cycles were measured in years. And now it's absolutely months headed towards weeks. It's intimidating. And all you can do is get up and open your browser in the morning and hope you can keep up with what's going on. But it is also amazing because if there's a problem you're looking to solve, chances are somebody out there has worked on it. And then there's a significant chance that what they've done is open source. And a lot of times you can contact them and collaborate directly. We're getting involved in an open source project right now where it's like we wanted to see a feature implemented a little sooner than it looked like it was on their roadmap. And I just contacted one of the leads and I said, "Hey, can we get involved?" And they're like, "Oh, great. We've been hoping to solve that." So it's very participatory. It's very democratic. And I actually think it's going to keep moving faster and that closed systems, paradoxically like open AIs, are going to struggle. Because this is a big wave of a lot of smart people from all over the world who are...
Julien Redelsperger : « How do you do that? Do you read? Do you listen to podcasts? Do you watch documentary? What's your secret ingredients? »
Sam Bogoch : « I subscribe to a bunch of feeds in the space. There are people that curate this stuff on a daily and weekly basis. Hugging face is a great resource to see what's trending. It's just a whole world, right? And the deeper you go into that world, the more information you can get. And then the problem is you have too much information. And then what you want is expertise. I'm not in a position to judge whether a new LLM is really something that we should be using or adapting. But there are really smart people on my team who I can just say, "Hey, should we be looking at this?" And they're like, "No, no, no, no. It uses way too much GPU. It's not practical." But you're basically looking for reasons to throw things out because there's so much there. And then what you end up with is like, "Yeah, this might actually work. Can we adapt it? Can we participate?" And it's also, given the pace right now, it's also a question of, "Is this going to last?" Because you pick a technology, you'd like to be able to keep using it for more than a month or two. Otherwise, you've wasted your time. We've already seen this in other areas like JavaScript frameworks, for instance, in UI. It seemed like for a while there, every few months, there was a hot new JavaScript framework, right? And if you picked the wrong one, two years later, you couldn't find anyone to maintain the code because they'd all moved on to the latest cool new one, right? "Oh, React. React is so 2021. We're all using Vue now." It's like, "Well, wait a minute. We just got this product built." Luckily, React is still there, but there are other technologies that are not still around because of the quick obsolescence. And AI is like that, but fivefold. These things are coming and going in very rapid waves. »
Julien Redelsperger : « All right. Perfect. So thank you so much, Sam. So at the end of each episode, the guest must answer a question posed by the previous guest. Are you ready? »
Sam Bogoch : « I'm ready. »
Julien Redelsperger : « Perfect. So after that, you'll have the opportunity to ask yourself a question for the next guest. So here's your question, courtesy of Adam Ahmad. He's the CEO and founder of Kia, which is a company that uses voice AI to streamline operations for restaurant phone orders. We can listen to his question right now. »
Adam Ahmad : « There's a big debate about this around the world. And the question is, does pineapple go on pizza? »
Julien Redelsperger : « This is an absolutely not tech AI-related question, but still a philosophical one and an important one. What's your take? »
Sam Bogoch : « Well, de facto, it does because people like it and they eat it. I am not one of those people. It's just a personal preference. I hate pineapple pizza, and I think it's an atrocity. But in the wonderful world that we live in, luckily, there are enough people that disagree with me that pineapple pizza is a thing. And I don't begrudge them that desire. So I'm more of a classic margarita person, you know, give me a little bit of basil and I'm good. But again, you know, it's a wonderful world we live in and more power to them. »
Julien Redelsperger : « Perfect. I'm not going to ask which city between Chicago and New York has the best pizza. I know this is an ongoing debate in the U.S. »
Sam Bogoch : « It is. I'm a New Yorker, so you know how I'm going to answer. »
Julien Redelsperger : « All right. Perfect. Thank you, Sam. So now, what question would you like to pose for the next guest? »
Sam Bogoch : « What is your answer as to how many years to AGI and why? »
Julien Redelsperger : « Perfect. It's simple, it's effective, and it's super interesting, actually, because it's going to be a game changer, I'm sure. All right. Perfect. Well, Sam, it's been an absolute pleasure speaking with you today. Thank you for joining me. »
Sam Bogoch : « Thank you. It's been my pleasure. Thanks a bunch. »
Julien Redelsperger : « And that wraps up another episode of the AI Experience podcast. Thank you for tuning in and we'll see you next time. »