Nick Durkin is the Field CTO and VP of Field Engineering at Harness, where he focuses on solving the technical challenges that are standing in the way of true innovation. Today, we're talking about how Harness uses AI and ML to remove the most annoying parts of your job so that you can focus on what you do best.
Liesse Jones: Welcome to DevOps State of Mind, a podcast where we dive deep into the DevOps culture and chat with friends from small startups and large enterprises about what DevOps looks like in their organizations. I'm Liesse from LogDNA. Join us as we get into a DevOps state of mind.
Liesse Jones: All right, Nick, welcome to the show.
Nick Durkin: Liesse, thank you so much for having me on. Genuinely appreciate it.
Liesse Jones: Yes, this is going to be fun. I'm excited to get into the conversation.
Before we start, can you just tell the audience a little bit about yourself and how you came to be the Field CTO and VP of Field Engineering at Harness?
Nick Durkin: Yeah. It's an interesting role, but one of the neat things is that you aren't told is that this is an opportunity you have coming out of college. Like this is not something you realize, and like a lot of us, we came from being the customer. So I came from building redundant data centers for the banks, building a whole bunch of patents and fraud prevention and mobile authentication and data aggregation and doing it for the financial institutions for critical infrastructure for the United States.
And in so doing it for some of the most highly regulated, most complex environments, gave me a little bit of insight to be able to do it then for a lot of different companies and so I made the jump over to the other side of the world and actually am now selling to the companies not being the consumer of it. I worked a lot in Silicon Valley with a lot of different VCs and a lot of different SaaS startups, specifically business to business SaaS startups, and really helping find market fit, finding first customers and growing sales engineering teams. My background specifically was in data analytics, AI, ML. I used to be called the “biggest guy in big data.” I think I've probably lost that in the timeframes, but that was something that it used to be coined.
Liesse Jones: I like it. So catchy. Where in the world are you located?
Nick Durkin: Great. I actually tell everyone that I live on an airplane because I'm usually wherever customers are. I'm all over the world, but I do my laundry in Arizona. So my family and I reside here in Scottsdale Arizona.
Liesse Jones: Awesome. It's beautiful there. I’m originally from New Mexico. So I’m a desert girl at heart.
Nick Durkin: You know it well.
Liesse Jones: Let's dive a little bit into your role. I think for some people, the role of Field CTO might be a new concept. You said yourself, it's not necessarily something you knew was a career path right out of college. What does that exactly mean and how does it differ from your “Standard CTO,” I'm using air quotes for those listening.
Nick Durkin: Yeah, a typical CTO. When you think about a field CTO, this role has bearings in the sales side. So in the pre-sales portion of it. And so my background comes in the architectural side and designing, and this is what we did early days. Jyoti our founder, Rishi our co-founder—both co-founders—we would sit in rooms and we'd whiteboard and figure this out in the early days. And then we had to figure out how it was gonna work with customers. And so that's where you need someone who can actually interact with the customers and who can actually bring that data back and ask even the hard questions of why.
So a lot of customers will bring requests like, “hey, we really want to do this.” Why do you want to do that? There's 450 people who don't, that are very similar to you, that don't do that. Why would you want to do it? And really get to the why. So it's about being that technical individual that can help shepherd your product through to other customers, but then ultimately bring back the requirements and the requisites back to engineering.
I think of the Office Space movie. And the guy says, “you know, I'm a people person. And I bring the requirements from the customers to the engineers and they can't talk to each other.” So in part it's that. But it's also in part being able to deliver a roadmap and understand at the most technical levels how our product works and how it integrates. So then when I do have a conversation with the customer I can truly understand how that could actually function, where it could be built, what timeframes it would take to make. So it's really more customer facing than a traditional CTO. Still work hand in hand with product, but still also work with sales engineering teams, solutions architecture, and post-sales as well.
Liesse Jones: Awesome. I'm sure you've read the book Start With Why. It's like in the marketers starter pack, so definitely something that's been on my bookshelf for a long time, but I think it's so important to just ask that foundational question.
Nick Durkin: It's extremely important. I think often times we're dealing with extremely intelligent people, so they've already figured out 90% of the puzzle. And oftentimes their question will be, can you solve the last 10% for me? Now that we don't know we're getting asked that, but that's the question, hey, can you solve this little thing that I have a gap for?
And what's infinitely more valuable to the company who's asking for it or to the business is to figure out what was the ask as a whole in the first place. Where did the beginning start? Why do we need that 10%? Okay. We'll actually need more. Okay. What have you done to get there? How have you approached it? What could we do differently? Because oftentimes you can leverage what's already there or take a different route that wasn't available to the company before. And so giving them new information can help them make different decisions, but if we don't get to the why it never shows itself.
Liesse Jones: Tell me a little bit about the problems that you're solving at Harness and what it is that you all do.
Nick Durkin: We started life as continuous delivery as a service. So when you think of CI and CD, we talk about them together, but we'll talk about CI as being code to artifact. Code, doing a test, getting a build, getting artifact out. And when we think about continuous delivery we think about it from artifact, not just to production, but to the customer because the customer has got to be happy. So just because it hits prod, but it doesn't work that doesn’t do us any good, so we say artifact to customer, and that's where we found the hard part was. So we solved continuous delivery as a service for folks—that's where we built our company around and with all the things that we've done, we've built it with a mindset of using AI and ML to actually make people's lives easier.
So we started with continuous delivery and we said, let's think about deployments the same way your best engineers would. So let's go look at the metrics that we care about. Let's go look at the performance. If those are slow, let's go look at the logs and understand how you know what's actually broken and think about it like your best engineers.
Yep. That's fine. That's fine. I see it every time. Don't worry about it. Ooh, there's a new one. Let's go look at that.
So when you think about it, if you ask anyone in the DevOps space, “who loves babysitting deployments?” everyone will tell you “no.” You ask them, “who loves the free pizza in the war room?” everybody's like, “heck yeah.” But now war rooms are remote, so we're not getting the pizza anymore so it's just battle together. And so that was where we started the company with just continuous delivery. And all of our customers said, hey, there's so much more in the software delivery space that we'd love for you to take and help us in that same aspect. And with that still roll governance and compliance and security and auditability and speed, but really let's take the worst part of our jobs away from our engineers and some of our best people.
So we entered the CI space because we thought it was solved. We thought CI was kind of a solved thing. Jenkins had it sorted, right? The Circles of the world had done it. And our customers said, hey, we'd love that same paradigm that you have for continuous delivery for our build side for CI for developers. And so we of course acquired Drone, the most loved open source CI tool on the planet, and massively invested in that.
But then even then with our enterprise offerings, let's take care of what engineers hate. So who loves running tests forever? Right? Like, Hey, I go do my build and I've got 7,000 tests I have to run. I'm going to go get coffee and wait. No one likes doing that. So let's think about it logically. If I changed the gas cap on your car, you wouldn't expect me to go check every single electrical system and every single light and every radio, like that wouldn't make sense. But we do that every time we do a build and it just doesn't make any sense. So let's take away all of that extra workload. Let's take that from our engineers. Let's give them some of their time back. Let's prioritize it. Let's take the tests that fail often and let's run those first. Like, why not, why not fail fast? So do that in a meaningful way. And really throughout this whole process, the entire portfolio that we have, whether it's that with this cloud cost management, no one loves turning on and off servers for non-production and saving money. It's great that it saves money, but it's a pain. If I come into the office and my servers aren't up, or if I'm trying to get in on a holiday when I feel like working and they're not running, or vice versa, we're just wasting money.
So why don't we do that again, intelligently like your best engineers would. And then the same thing for feature flags. If you want to do complex deployments, you can do it with our continuous delivery. But if you want to flag a specific module to a group or to an environment or to a region or a demographic, a lot of tools are out there that do that today, but they're not integrated in your pipeline where it's audited, it's compliant and they're not thinking about it like your best engineer. So if you ask a company “well, great I flipped a flag so that the east coast gets a specific view than the west coast” how do you verify it? And they verify like we used to verify deployments 10 years ago, which is we asked the customer. So let's do it intelligently. So I know that was a long answer to a short question, I apologize for that.
Liesse Jones: No, that's what we're here for. We just want to hear you talk.
Nick Durkin: Yeah. Well, I appreciate it. I can do that a lot. So you can tell me to be quiet at any point.
Liesse Jones: No, not at all. You mentioned compliance a couple of times. Let's just take a quick pause on that and talk about why that's really important for your customers and if there are particular compliances that you hear people asking for more than others, right now.
Nick Durkin: I think this is one of the things that people forget when they start on their continuous delivery journey or their software delivery life cycle. We get into doing and changing our ways because we want to go fast. So the first reason is, hey, I want to go fast and I moved to an automated approach because I wanna be able to get things out to our customers faster. And that's pretty much standard across the board. What we find then after that is now that we started going fast, we have to do it governed. If we look at every customer, five or 10 years ago where every customer is going to be digital. Well, now, if you're digital, every customer is under some regulatory. I mean, if you've got a store you've got PCI compliance and a few others. So whether it's SOC, it's PCI, or whether it's HIPAA, if you look at all of our customers, traditionally, they are regulated. Most of them, not all of them, but most of them. And the reason is because doing this fast in a governed compliant way is really hard. And it's proven because we have so many DevOps people trying to do it. This is not easy. This is the hard part. So when you have a tool that's built in from the ground up, this is where it makes it beneficial. So you can guarantee you've met your compliance, your security, your governance. You can actually make sure that if you fail because of that, you invoke proper failure strategies, and it's part of a platform from the ground up. And when you think about Harness, everybody's on a different stage in that journey.
So traditionally people will start with let's go fast. Then after they go fast, they will have to govern it and they'll script it and they'll figure it out. You know, they'll use their open source tools or they'll use the CI tool and try to do it themselves. If they can achieve it, amazing. The next issue they actually have is quality. So you go fast, right? Then you find a way to govern it and make it secure and compliant. Now you can deploy really fast. So your monoliths are turning to microservice. Your microservices are now deploying daily, multiple times a day. You know, they're infinitely faster. You've got an end to end problem. And people don't know what broke what anymore. And this becomes the next challenge. So How do we verify it? And what's interesting is if you can achieve all three of those, if you've been able to script and create something that can deploy successfully, and if it fails has the appropriate steps and measures to take care of those failures for you, now you get into an efficiency issue. Which is how many resources am I using? How often are they up and running? What percentage am I using? Are we leaving them open? And now we have issues where cost comes into play, especially if we're in the public cloud, because we're deploying so fast, things can be left out there.
And if you don't plan your entire software delivery life cycle around that specific set of requirements, you'll hit these as you progress. And everyone we find is somewhere in that journey. Very few times we found somebody who, from cradle to grave, has got it all sorted. We're all working on that.
Liesse Jones: How do you think the cost factor changes depending on where a company is in its journey? I feel like we see this a ton in the valley with companies who had a huge series A or series B, like this year was just insane the amount of money that is going into these companies. So something like cloud costs might not be that big of a deal to them. And they build around this idea that they don't have to be thinking about it. And then eventually get to a point where they're like, crap, we have built an entire business around not caring how much we're spending on infrastructure. How do you see that show up for your customers?
Nick Durkin: What's even more interesting is we see customers that let's just go build, we'll build this as we can. And of course, we go to the cloud because I can start cheap and I can have a HA data center for two tiny instances, and I'm highly available across two data centers. That's amazing. That would cost me millions of dollars just to build the data centers, let alone put one server in it. So your barrier to entry has come down with cloud. So we've seen that, but what ends up happening is that in these companies, we go and say, hey, there's the auto mall with every car manufacturer out there, go pick any car you want. And, you know, some of us are smarter about it, and we put a little boundary on it, say, hey, you can only spend this much a year. Right? You give them a budget, but they don't even know the budget's there. They don't know how much they cost and they go grab the Ferrari. And the cool part is they get to use that for three hours and they love it. But then they realize I got to give it back because I've hit my budget. And this is just, it's not an appropriate way.
So what we found is that people aren't actually empowering their engineers. Engineers don't have visibility into the cost yet they’re being pushed to shift left everything, including the choice of server, but yet they're not given the dollars in the cents and information. So our view on it is that let's empower those people. Let's give them that information now so that they know when they're going to deploy what it's going to do to their spend.
Hey, this is going to 10X your spend, because you're asking for a whole bunch more replicas a whole bunch of memory that you don't need. You probably want to know about that before you press purchase as opposed to 30 days later, when the CFO yells at you, and 15 days later after you've done all your research and figured out we need to shut down. So now you paid for that for 45 days. It just doesn't make sense. Why wouldn't you empower the people that are being specifically given the permission and the requirements to go make those decisions? Why wouldn't you give them that information? And that's where we want to change the game.
A lot of people say cloud cost management does not belong in the software delivery lifecycle. Wrong. I think it doesn't exist there today because the tooling doesn’t do it. I think it's absolutely where it belongs if we truly believe in DevOps, and we truly believe in CI/CD, and we believe in shift left. If that's true, they need that information. They have to have it. And so it's really about empowering them.
Liesse Jones: We recently did a body of research with the Harris poll to understand how people were approaching observability and this intersection of cost and observability data management. Something that came up was that everybody thinks that they need to save all of their data, obviously, but it's super expensive, especially if you're using a single pane of glass solution, or if you're sending a bunch of stuff to a SIEM and duplicating that data and sending it into a log analysis tool as well.
It turned out that a huge percentage of people were actually dropping logs and not storing them at all. Once they hit whatever their budget cap was, which is super dangerous.
Nick Durkin: And super against regulatory in a lot of cases, not all cases. Usually people are particular about those cases, but oftentimes this happens.
Liesse Jones: Yeah. And then you don't have the information that you need when you need it and that can show up in so many different ways, right? From a regulatory or compliance perspective if you have to do an audit and the information is not there you're not compliant. Even if you just need to troubleshoot or debug an issue.
I find it really interesting and I love that you talked about shifting the control point left, and I think so many different industries are looking at how they can do that and how they can give more people more information and more control over making informed decisions. Like it doesn't have to be that way. You should be able to choose which data goes where, route it appropriately based on cost and volume and a ton of other things. But it still feels like early days.
Nick Durkin: Yeah, it's interesting. You know, I think a lot of people are starting to see a ton of new startups coming on that are specifically at the early ingestion of that data, right? The log data. They're really starting to process a whole bunch of it so that they can cut down costs, they can ship less, but still meet the requirements.
And I think it's interesting that it's become that costly or that complex a problem that we're building entire companies around minimizing that. So that tells you there's a pain. So look at the market, you'll know there’s pain, look at the dollars. So we know that that's true.
And what's interesting is that people don't realize that they don't have access to the data they need until they're asked and that's the worst spot to be in. Where you believe you had what you needed, you thought you were making the right decisions, only to find out that you didn't. And I think you and I even talked about this once before, which was when we talk about software delivery oftentimes we talk about people, process and technology or people, process, and tooling. And so we can take the logs and analytics and put it in the tooling side but the reality is this is about confidence. Do you have confidence that your people can deploy appropriately without making mistakes? Are you confident that you've made it easy for them to do the right things and you made it hard for them to do the wrong things? If you are, you'll let them deploy 24/7, 365. Like why do we have a blackout window because you trust that all of the measures in place, whether it's people, whether it's process, whether it’s technology, and that's why I think that's kind of garbage. It has nothing to do with people, process, technology it has a hundred percent to do with confidence. If you're confident in the way that it's going to happen, you'll identify.
Liesse Jones: I love it. That's such a good way to frame it.
Liesse Jones: Let's talk about AI and ML a little bit and what that means at Harness. Actually, maybe before we get into what it means at Harness, let's just hear how you would describe AI and ML for people who often conflate the two.
Nick Durkin: Perfect. I think this is great because I actually teach a course on this because this is such a misused term. People will say AI, and when you get down to it, they mean a standard deviation, which isn't even machine learning. Like this is just a mathematical calculation.
And so oftentimes marketers or different people will actually misuse the terminology. A lot of people will conflate it and say, “oh yeah, we use a ton of ML, a ton of AI” and it's just math. It's basic math, deviations and so forth. When it gets into machine learning we think about it we think about it in two categories. Anytime you want to think about things that are supervised, meaning where humans get to have the ability to make interactions with it, or unsupervised where we let the machine do it itself. And so you take like natural language processing, let the machine figure out are these different words or are these different pieces. This is great. It's a great concept, but that's unsupervised. I'm not sitting here telling it whether those two words are the same. It's figuring that out. Yet sometimes we want to give human context to machines. We want them to be able to understand things the way we do. And so that's where you want to give it neural feedback. And oftentimes people will say, oh, that's AI. It's not, that's actually human interaction. So that is a supervising machine learning model. Where AI comes into play, and oftentimes the boundaries into type of things like neural networks, where they're actually learning for themselves so they get smarter as the data comes in and they continue to process and build their models themselves. And that's kind of the entry-level to artificial intelligence where it's truly thinking about it itself.
Here at Harness, we use a massive amount of machine learning and then we use a tiny bit of AI where we found some mistakes that machine learning couldn't handle. And I'll give you a very clear one so that it makes sense specifically in the logging environment. This stands true for both of us.
So when you think about logs today, if I showed you two logs with a different user ID or account ID, you and I would know because we can look at those and say, oh, they're the same log. But if I showed it to a model that was using natural language processing, it would actually say they're different logs because there's two account IDs. So there's human context you want to add. And now I could keep training it every single time to say, “no, that's an account ID” but you've got thousands of customers, I don't have time for that. I have to build neural networks to think about it like humans do. And so that's what we've done, built some really neat neural networks to go and understand in your logs, oh, that's an account ID. That's a URL. I can take that away and ignore it and I can look at the actual log. Not only that, I can look at where the log comes from. So where in the actual app stack it comes from as well as the exception itself and really start determining is this the same exception in the same part of application in different parts of application, or is this a different exception in the same part of the application? And so there we're going to even use different methodologies to understand distance as well as similarity are they similar, but then how far apart are they in the application? And so we use a massive amount of competence to get to it. But without those neural networks, when we first deployed this one of our customers build.com you know, like a Home Depot for online—I've heard him described as if you know what you want, you go to Home Depot, if you don't, you go to build.com and you figure out all the things you want—they actually had an issue where their checkout counter had broken and they used Harness to continually verify whether this was working. Their APM metrics told them that everything was fine, their performance metrics said that everything was fine, their log data said everything was fine. No exceptions, no issues. What was interesting is that they keep a metric—and I think a lot of retailers do this—that says what percentage of people put something in their checkout counter and then actually check out. Some people put it in there and they never do for this certain portion, but that number went down to zero. Harness rolled back 30 seconds. Traditionally, they would have taken a 30 minute outage and that's literally revenue you can count for a retailer it's millions of dollars. So instead, 30 seconds later, it was back to normal. So basically in the time that you retry to buy it was up and ready and going, and they had to do nothing. There was no human interaction, no changes of code, no figuring out what and how it was configured. Because Harness knew what was deployed on which infrastructure with which configuration and which secrets to get them back to where they were before this, all using that AI and that ML. And so we truly do use a massive amount of ML depending on whether it's time series data or log data and then we use some really cool neural networks to think about things like humans, too.
Liesse Jones: Cool. I love it. I've heard you say that you wouldn't even consider working at a company that wasn't using AI or ML. Why do you feel that way?
Nick Durkin: Great question. If you go back to late 1800s everything was done by horses, right? Like, I mean, seriously, right? Like we farmed with horses, we built with horses, we traveled with horses, and then we built mechanical muscle and it made our lives infinitely better.
The automobile changed a lot of things, manufacturing, the combustion engine in general. Now we made everyone's jobs easier. We lifted them up, we raised all the tides and all the boats rose and we're in the same kind of area now but instead of mechanical muscle, we've got mechanical minds. And to that point, when I talk about what we're doing, we're not taking people's jobs, we're making them a lot easier. And that is where the massive amount of opportunity lies. Is taking away the most mundane part of people's jobs is where you're going to see a huge uptick in software. And so in my mind, I'd rather be on the side that’s creating those innovations as opposed to being one that is impacted by them.
Liesse Jones: Other than Harness, who do you think is doing really cool stuff with AI and ML?
Nick Durkin: I have to go to Nvidia. I mean, if you watch what they've done just in the hardware side of this space. When you look at the original math coprocessors that were created and then turned into video cards, and now what we're really doing with them there's some massive places.
There's also some really cool in memory databases that are instantaneous responses to terabytes of data. And it's just amazing to see what I used to have to wait weeks to get out of a report out of Hadoop clusters, it’d take hours. Now I can have sub millisecond response times so it's absolutely amazing because of the way that they're starting to do this work.
Liesse Jones: Sometimes I do an ice breaker with folks before the podcast. We didn't do it today, which I'm regretting, but one of the questions that I love asking is, do you think that AI is as far advanced as the public thinks that it is?
Nick Durkin: That's a great question. And I think it's actually both, so it is not, and it is. And I think where there's profitability, it is far. And where we thought it would go in helping humans in just generic capacities, it isn't. And so where we've spent a massive amount of time and energy and effort building artificial intelligence to handle those really heavy workloads, the assumptions made by news and by the populous is that it would be mostly for the consumer, and it's not. It's for business to business because that's where it's being monetized. The benefit though, is that all of that work carries over to the consumer market. So it will be there, but I believe the business market’s achieving where it needed to be. On the consumer side I don't think there's enough time and energy spent and I think it's because it's difficult to monetize.
Liesse Jones: What period of time do you think we need to wait to see that change?
Nick Durkin: I think you need one big catalyst and I've been looking at the self-driving car as that catalyst. Once you get enough of a percentage, five, ten percent, of self-driving cars that can prove without a doubt that they are safer than humans driving at that point you will change public opinion, you will not have the fear, a lot of those things that are preventing in the consumer space. I think once you reach that then you've actually achieved it. And then what you'll see as a huge windfall. So you'll see from 10 or 15 percent to 70 in a very rapid succession. And if you go look historically, that's accurate and how we see trends just in general. And I think you'll see the same thing for AI. Once it's useful in an area that we have as a burden to us every day and we can see that it's useful than it'll truly take over.
Liesse Jones: Going back to your horses. Perfect example.
Nick Durkin: Perfect example. Right? And that's the thing is these are industries that we're affecting that are our largest industries in America, if you look at transportation and retail they are our two largest industries, and if you look at the most automated it’s transportation and retail. And you'll continue seeing that again in the consumer space.
Liesse Jones: Am I remembering correctly that Arizona was one of the first places where self-driving cars could operate without a human?
Nick Durkin: Absolutely. If you look at Arizona specifically, or if you look at the Phoenix area where they've primarily been using them, we see Waymo's around here all the time and different Google ones, and of course Tesla tries them on here as well. What you'll find is that our streets are built on a grid because we haven't been a state that long, you know history is recent here in Arizona, so our roads are actually built on a grid and they're built wide and they're built without some of the things that are a little harder. If you bring a video game home and your children want to play it, you don't put them on the hardest level with all the complexities on day one, you start and the levels get harder. Phoenix is going to be the easiest area to do it. It's a grid, large streets. Once you get into more complex areas, like Boston and San Francisco—
Liesse Jones: Yeah, I know it's a nightmare to drive here as a human.
Nick Durkin: Oh yeah. As a human it's hard and now imagine all the inputs. So it's much easier to do it here. It's been great. And they can test it with some of the elements that they worry about. Phoenix is used oftentimes for the heat as well, just to make sure that those chips aren't overheating when they're getting into real-world scenarios.
Liesse Jones: Anything I didn't ask about Harness that you want the world to know or about yourself personally?
Nick Durkin: Every business has drivers and intent, at Harness we're around our people. We have very remember the human kind of concepts. Like tomorrow I won't be working, nor will anyone at the company because it's one of our TJIFs so everybody gets the day off because we know in this virtual world, we're actually working more. We're back to back to back and there's no breaks. And so get some time to go to those things that you need. And so we think about people and then I think, you know, our intent is to truly change the way people are doing business. Everybody is saying I want to change the world. Well, if I can change the world and give back some nights and weekends to the developers that aren't home with their kids. If I can do that I think I've made an impact. And if that's all we do great, but I think we can do much more, but that's literally what we're here to do. We're here to make people's lives infinitely easier in the software delivery space.
Liesse Jones: That's amazing. Well, Nick, it's been amazing having you on the podcast. Thank you for joining. For people who want to learn more about Harness, I will be sure to link to your website and I love what you guys are doing, super excited to see what's next.
Nick Durkin: Absolutely tons of new announcements coming around availability for more technology, more things around the self delivery space and maybe even some things making available for more developers, so tons coming here in the future. Thank you so much for having us on. Reach out, harness.io is our website and you can go get a free trial there.