Tuesday, December 19, 2017

Interview Qualcomm’s Gary Brotman, Part 1: Hexagon DSP, Working with AI

Qualcomm is at the forefront of artificial intelligence computing on mobile devices, with many advancements in recent years and many more to come with the launch of the Snapdragon 845. Our very own Editor-in-Chief Mario Serrafero recently had the opportunity to interview Gary Brotman, a product director at Qualcomm who heads up the company’s Artificial Intelligence and Machine Learning Product Management.

In this first part of a two-part series, we learn about Qualcomm’s current AI efforts and plans, and closing the AI adoption gap and the role of open source.


Mario Serrafero: It’s clear that Qualcomm is putting a lot of emphasis on heterogeneous computing, the DSP in particular, and HVX specifically. All this stuff [aimed at] machine learning, but consumers at large, and even enthusiasts and reporters, still primarily look forward to CPU and GPU figures when it comes to these events. Do you see this changing? And what measures is Qualcomm taking to kind of properly explain and spread the information about the DSP and its role in artificial intelligence?

Gary Brotman: I’d say with the Snapdragon 845 specifically, we are changing our attack [strategy] with respect to how we push HVX specifically and this technology with AI. The DSP is a primary compute element for AI because of the nature of its architecture — it just makes perfect sense. We’re getting an eight to ten times improvement compared to just using the CPU, and the energy savings are considerable.

Suggested reading: Why the Hexagon 685 DSP is a Boon for Machine Learning

From a promotion standpoint, when we look at the developer community, we’ve got a number of different ways that we reach out. You know our audience is fairly well-defined, and I don’t want to say finite but if you look at mobile, it’s pretty finite. Those are the developers that would cater to our OEMs first and foremost. We have a pretty robust roster of ISPs (international strategic partners) or development partners doing things right with computer vision.

The DSP or the HVX ecosystems go pretty deep, because [they’ve] been around for a while. Actually, it started with the audio DSP and then it worked its way into computing. So that ecosystem for the DSP is big, relative to the kind of emerging AI specifically for a system. But I wouldn’t necessarily say that there’s anything that we’re doing today independent of, say, the Hexagon SDK or the open sourcing of the Hexagon NN library — the neural network library, that’s also open source. And most of the efforts to push that have been through relationships with folks like Google, tagging on to the TensorFlow ecosystem. We do ship all these SDKs with the neural processing engine or Hexagon NN. They’re all part of the development kits that we ship to anybody who orders them. They’re on our developer portal, so they are freely available. We are looking at, in 2018, the different ways that we can bolster our developer relations activities. The most mature [developer community] that we have today is the gaming community, we’re pretty well embedded in that, and AI is emerging so we’re kind of starting to invest a bit more in that over the next 12 months […] [W]e try to do as best as we can with the documentation and how-to [guides] that we have. I think now, actually, it’s a matter of doing more aggressive demand generation.

Q: Cool! So it’s a relatively new technology, and there’s still kind of no set way to present these advancements. Some companies like Huawei and Apple [advertise] operations per second — trillions of them. With traditional CPUs and GPUs, we’re used to having these metrics, FLOPs, and so on. How can we have a way to demonstrate the year-over-year machine learning advancements when it’s something that happens in the background and not something user-facing?

A: Great question! So I was just having a conversation with somebody earlier about benchmarking, there’s no standard benchmarking, [and] we use the numbers too. It’s like looking at how fast you were on an Inception v3 Network. That’s the common standard for classification, but that’s not a use case. It’s just a network type and you can feed it in and see how it runs and out pops your inferences per second, or per minute, depending on how big of a number you want to share with your marketing team. That alone doesn’t tell you enough either, because inferences per second don’t give you a power number […] and then there’s accuracy. You might have bumped up the FPS but [that] might have come at the cost of something else. We’re still trying to figure out what the right approach is, but it’s going to be… power, performance, and accuracy. Those are going to be the three. For us, too, there’s also an area cost that’s separate from benchmarking. Some folks have just incredibly fast hardware but they are big and power-hungry. Great performance, but not good for cell phones. Not in a small embedded device.

Snapdragon 835: DSP vs GPU vs CPU in ML applications. Source: Qualcomm

So those are the three metrics, and people talk about TOPS and GOPS and all that, that’s misleading […] those are total operations that are possible given the nature of the hardware, but that’s not effective results. What you need is something that sort of normalizes between, let’s say, TOPS and inferences per second, or something along those lines. Find some way of normalizing that data. I don’t know if you’re going to solve the problem of marketing numbers versus what’s really happening. I’d expect to see in the next six months […] some sort of a benchmark will be emerging. We’re working on some on our own, but in terms of some way that the industry can actually measure and say, “OK, yeah, here’s an accepted classifier, detector, whatever.”

Q: Area would be something [users] would have to factor on their own — the consumer ultimately doesn’t care about something being big or small, but they do care about the resulting performance and the resulting power reduction.

A: Correct, right. The three key metrics are power, performance, accuracy. Accuracy, depending on the use case, it could be critical.

Q: Right right, or if it’s like 90 percent for something that’s not that critical, that’s fine too.

A: Are you looking for cats or human beings on the road with your car?

Q: Right, precisely!

A: There are two different metrics. Tolerances are a better way to put it.

Q: Another thing is — and this is just kind of one of my pet peeves with machine learning — like Travis [Lanier] said in the keynote, “Vector math is the foundation of deep learning.” A lot of people are confused because they hear names like “neural processing units” and they think it’s something different, something qualitatively special. The question is — a lot of reporters do have this almost-mystical view of AI. Do you think it ultimately affects consumer perspective on all these “neural” processing units?

A: That’s a really good question. It’s not clear how an OEM is going to market some sort of unique hardware without a demonstrable use case to back it up. I think it’s really gonna be more around the use case and the benefit that they get from that hardware. It’s too soon — I couldn’t predict how others are going to do it, maybe part of their consumer marketing messaging. Is this the key to AI? Is it this dedicated acceleration? Or is it really going to be just the use case?

Apple does a lot, Huawei does a lot, in terms of promoting a unique piece of hardware. But ultimately, for us, I think it just comes down to what we can make possible on the chip. To your point about these being big math engines […] For us, it comes down to – and I’ll refer to what Keith said today – “Where there’s compute, there will be AI.”

Q: Yeah, like your SDK allows you to run it on a [CPU] core, or the DSP, or a combination.

A: Right, it’s your choice and it’s dependent on the concurrencies on the system, it’s dependent on the power/performance profile in the use case. We believe that it’s not a one size fits all. It is a heterogeneous compute problem. Heterogeneity is the key to making on-device AI actually interesting. So from a marketing standpoint, I think it’s too soon to tell whether marketing dedicated hardware just by itself is going to be appropriate. I think ultimately, if the use case is compelling enough, [then] great.

Q: Actually, this ties into my next question. Like [Google TensorFlow Tech Lead] Peter Warden said, a lot of this stuff is yet to come to light. He used this analogy, that it’s like we’re playing with LEGOs and we’re still trying to figure out which use cases to implement it on, and also how to do it, how to split the training and the inference… All this stuff is obviously very complicated, and that’s why when there’s a standard set, we say “OK, we can figure out how to do this for computer vision,” then it explodes and you see all these computer vision applications. So what approaches is Qualcomm taking to stay on that cutting edge in this fast-moving field?

So, we have two functional groups. We have the research side of the house and then we have the commercial side of the house. The research team has actually been looking at this problem since 2008. They’ve been focused on deep learning, looking at spiking neural networks, deep neural networks. But it’s only been in the past two years that they started to matriculate for production. That’s why I was brought into the TCT group, to kind of shepherd technology out of them. We look as far out as we can, but we can’t always predict what’s coming next. We’ve had connected camera customers come to us want a specific detector network, single-shot detector (SSD). They say, “This is what we need, here’s the KPIs, can you support this?”, so we get into, you know, a month and a half or two months development time in supporting that. So a new type of network [comes out], it does detection with fewer pixels. The customers then pivot in the middle of the development cycle, they say, “We don’t want that anymore,” so we have to be very nimble. We’ve even gone from having a cadence of release for software for maybe three months, which in Qualcomm’s world is almost overnight.

With this, with their AI technology, it’s gotta be started at three months, [and] now it’s down to a month. So you have monthly release cycles, [and] parallel sprints to make this happen. We’re doing our best to keep up […] I think everybody finds that the pace of advancement in this marketplace is so blazingly fast that it does increase development costs and burden.

Q: In the [AI Q&A] talk, some of the presenters brought up how cripplingly long it can take to train some of today’s most useful neural networks and how that also the fact that training can take so long can, you know, stall research and development because you can only iterate so fast.

A: I know that if you’re talking about something that is incredibly complex, and you’re dealing with millions and millions of pieces of data – photos or whatever – I believe there are others who are focused on training who have been able to get training down to a matter of hours. Like four hours or half a day. So I think it depends on the complexity of the network.

Q: Absolutely, but one of the solutions that were brought up is federated learning which I think is perfect for mobile devices. Do you see, in the future, Qualcomm devices which are, of course, tailored to doing machine learning, inference and possibly training tasks at low power efficiency that doesn’t really impact the user being used for this kind of model?

A: Yes, that’s the answer! The answer is yes and we’re early days in that kind of exploration but you know the use case that Pete talked about, where they keep learning that behavior for the consumer, It’s not just about the fact that it could be run on-device. You can set rules to say, “Do not do any of this unless the phone is not being used and it’s plugged in”, right? Job done. I think it’s totally possible. I don’t have an answer for you for as to when we would actually have anything, but you can do it on Snapdragon today. So partners like Google, who are really ahead of the curve, they take advantage of the hardware now. Since the Snapdragon 835, they’ve been working on optimizing specifically around HVX. So that’s a kind of a fairly long-lasting relationship.

Q: And now, back to the developer side. So what I see is that research in academia, as well as certain industry sectors like health care and even some startups, have largely embraced machine learning. Smaller developers, not so much… and independent developers negligibly so. How do you see Qualcomm, or the industry at large and the advancements that we are seeing, bridging that gap for the future?

A: To me, I think it comes down to the tools that are available. It’s tools and data. So you’re a small developer and you’re trying to build an AI-driven application, the primary thing that you’re dependent on is the data to solve the problem. There are plenty of sources of free photos like ImageNet and a variety of other sources so that they can grab their common sets of data. They may not be differentiated in the marketplace but at least they give you a starting point. Data acquisition is number one. Number two, it’s the tools that you have available to you to learn how to train a model. To quantize a model to run on a device, on 8-bit fixed. All the tools are all there today. In fact, the thing that I’ve noticed and I’ve talked to my tech lead for my software team […] we think about hiring, we may need one or two guys that have machine learning expertise, but most of it is just brute force software development with some flavor that bleeds into what folks are doing with taking a training model. […] But yeah I think that the community at large has been very forthcoming. A lot of things are open sourced, [and] there’s no shortage of information.

Q: It’s such a powerful technology and to see it open-sourced to such a degree, especially from large corporations that stand to benefit from not doing that... it’s quite impressive.

A: I can tell you there are executives in my team, senior executives, who have decided to take a Coursera course to bone up. Stuff that they learned about in college that is theoretical, where it was far, way far out into the future. But there are so many resources available today to developers, that it minimizes the burden that we have, then we have to just educate [people] on how to use our tools right now, and how easy are our tools [to use] relative to others. Does it level the playing field, or are we isolated? And I think we have to make sure that we’re standardized in some way to make it easy. I think that’s our goal.


 

HostGator Web Hosting

0 comments:

Post a Comment