#210: Codeflash and continuous Python performance with Saurabh Misra Artwork

Pybites Podcast

The Pybites Podcast - Insights to become a world-class developer.

Coding is only half the battle. To truly succeed in the tech industry, you need more than just syntax, you need strategy.

The Pybites Podcast is your weekly mentorship session on the soft skills and career skills that senior developers use to get ahead.

Join Pybites co-founders Bob Belderbos (ex-Oracle) and Julian Sequeira (ex-AWS) as they share real-world insights on mastering the developer mindset, crushing imposter syndrome, and navigating your career with confidence.

Whether you are a self-taught beginner stuck in tutorial hell or a senior dev looking for that extra edge, we cut through the fluff to help you build a career you love.

Website: https://pybit.es
Julian: https://www.linkedin.com/in/juliansequeira/
Bob: https://www.linkedin.com/in/bbelderbos/

All Episodes

Pybites Podcast

#210: Codeflash and continuous Python performance with Saurabh Misra

January 05, 2026 • Julian Sequeira & Bob Belderbos • Episode 210

0:00 | 50:12

Speed isn’t just a nice-to-have - it affects user experience, cloud costs, and how fast teams can move. In this episode, we chat with Saurabh Misra about making Python performance a continuous habit rather than a last-minute clean-up. He introduces Codeflash, a tool that profiles real code paths, explores optimisation options with LLMs, and only suggests changes that preserve behaviour and deliver measurable speedups.

We delve into how this works, from tracing and line-level profiling to coverage-guided inputs and concolic testing. Saurabh shares real examples, including smarter NumPy usage, avoiding unnecessary global sorts, and using Numba to speed up numeric hotspots. We also talk about fitting performance checks into everyday workflows via the CLI, VS Code, and GitHub Actions.

The big takeaway: performance doesn’t have to slow teams down — with the right tooling, it can be part of shipping well from day one.

Connect with Saurabh at https://www.linkedin.com/in/saurabh-misra/ and find out more about Codeflash via the website https://www.codeflash.ai/.

___

💡🧑‍💻 Want to become a more focused, motivated, and effective Python developer? The Pybites Developer Mindset (PDM) Program helps you build strong habits, deepen your skills, and make real progress in just six weeks. Join us for mentorship, accountability, and a supportive community that keeps you moving forward. Start your PDM journey now! 🌟✅ https://pybit.es/catalogue/the-pdm-program/

___

If you found this podcast helpful, please consider following us!
Start Here with Pybites: https://pybit.es

Welcome And CodeFlash Teaser

Saurabh 0:00

Right now there isn't really a great option for these companies other than actually asking some really senior people to go and optimize it. And that's expensive. So if they can use CoreFlash, they get all these benefits of lower costs, like reduced latencies and like faster processing and everything. So it's a big win for these companies.

Julian 0:21

Hello and welcome to the Pie Blights Podcast, where we talk about Python, career, and mindset. We are your host, I'm Julian Sakura, and I am Bob Veldebost.

Bob 0:31

If you're looking to improve your Python, your career, and learn the mindset of success, this is the podcast for you. Let's get started. Hello and welcome back everybody to the Pybytes Podcast. It's Bob Veldevost. I'm here at uh Sarrap. Mishra. Sarup, how are you doing?

Saurabh 0:47

I'm good. I'm doing good. How are you?

Bob 0:49

Yeah, good. Thanks for joining. Uh happy you could make it because uh you built a really cool uh tool, Code Flash, and uh I'd like to discuss that with you today. Um, but it does. Uh I I've used it, it's really cool. Um yeah, and then we go deeper into um the technical side, optimization, that's what the tool specializes in, and the business side as well. But uh yeah, before we do that, uh, do you want to give us a quick intro? Uh who you are, what you do? Sure.

Speaker 3 1:17

Yeah, yeah.

Sarup’s Background And Idea Spark

Saurabh 1:18

I'm Sarab. Uh I'm building Code Flash. So I grew up in India and then essentially studied electronics. So I come from a background of computer architecture and building uh computers and systems. So I worked at Nvidia, built performance tools over there for a couple of years, then came to US, trade machine learning, and like worked at a few startups doing ML engineering and like companies like Meta. So I have this like background of uh computer architecture as well as uh machine learning. And essentially when GPT-4 came out, I realized that now these LMs can start optimizing code. And I think that's like when I found that my passions of like writing fast code and also machine learning came together. And yeah, that's like how I started working on CodeFlash itself.

Bob 2:01

Nice. So that was at the birth, more or less, of the GPT 3.4 of uh end of uh 2022, right? Yep, yeah. I think it's like started working on this idea since 2023. Right. So you say like you're already uh working in code optimization, and yeah, the LLM was really nicely timed to start automating that, right? Um before LLMs, how how would you go about that?

Saurabh 2:25

Yeah, so I think code optimization has always been like this problem, right? I think since the beginning of maybe even computer programming, because like computers had really less number of resources back then. So you had to like really make your program run in under a specific RAM, under like a specific like clock cycles and all that, but it's not as dire right now. But I think it's I think I think of it of code optimization, obviously it has great benefits. Like everyone wants fast programs and like things to be snappier. It's all just like fun. Um, just trying to see, hey, you know what? We I wrote this. Hey, was there a better way to do this or not? I think it's just like a fun challenge as well. So I think I've always been interested in code optimization from that perspective.

Bob 3:09

Yeah. Well, first you make it work, right? And then you make it pretty, and then you make it fast. It's the third thing, or or earlier? Yeah, I think I think we're trying to change that a bit.

unknown 3:18

Yeah.

Bob 3:18

Yeah, yeah. Oh, by the way, do you have a win of the week? We always start with a win.

Saurabh 3:23

Yeah. Uh so I mean, we at Code Flash try to optimize a lot of open source packages as well in the Python ecosystem. So yeah, we've been optimizing Bokeh, this graphing plotting library. And yeah, we found like a lot of really good optimizations there. So we're currently like merging that with them and upstream. So yeah, I mean, bokeh should be just a bit faster for everyone soon. Nice. Nice.

What CodeFlash Does And Why It Matters

Bob 3:46

Cool, cool. All right, oh back to the tool. Uh CodeFlash. Do you want to give us a bit of an overview of how it works and uh yeah, high level?

Saurabh 3:53

Yeah, so CodeFlash is an automated code performance optimization tool for Python. Essentially, it figures out hey, what's the best or the fastest way you could have written something. And the way it does that is essentially by doing everything that an expert performance engineer would do when they're optimizing code. So what we do is we we look at some uh existing piece of code that works and then understand its behavior, understand what exactly does it do when it executes, and then we understand its performance profile. And then we go in and then essentially use LLMs to figure out, hey, to make them try to optimize it. So they don't succeed all the time. But then what we do is we essentially explore the different types of optimization that are possible for this given problem, and then check and verify, hey, are these optimization attempts uh valid? As in, do they have the same behavior as before? And if they have the same behavior, exactly the same behavior as before, then we check for, hey, is this actually faster? And essentially we try to we do this like constrained optimization and figure out, try to figure out the fastest version you could have written this. And yeah, and then from that, we find the fastest one and then we figure out like we try to improve its quality and try to create this like very minimal diff for you to look at and uh try to merge in. So the idea with this is that once we can automate performance optimization, um, a lot more code in the world can always run fast. And so my belief right now is that right now, like there's this sort of trade-off right now where you can you can either ship quickly, and then when you do that, you create this like kind of dirty code, and yeah, it works, awesome, and then you will look at it later. But that later rarely happens, especially when you move to the other things. So the idea is that like if you can make the cost of optimization to be zero, then essentially every piece of code that you can write can always be optimal, right? And that's like the the goal we are trying to um achieve right now with CodeFlash.

Bob 5:58

Yeah, very interesting. Yeah, because yeah, later never comes, right? Like uh developer it's very fast-paced, and uh, if anything, yeah, you also need to think about the code quality and and yeah, so in doing that, um you say you use LLMs. Um, I can imagine you do not like a fresh call every time. Do you do you have like a special model you've trained for this? Or how how do you um yeah, yeah, so the knowledge question, really, right? Like where you you get the knowledge from the LLMs. Do you do you feed it with your own experience somehow? Um, is there certain caching happening? How does that work?

Saurabh 6:37

Yeah, so I think we so think of uh optimization as sort of like a exploration problem. So there can be various ways to implement the same idea or the same logic, but not all of them are going to be the fastest one. And there is some element of discovery in here as well, that is like just trying different ideas to see, hey, what one, which one actually works the best in uh in reality. So the way we do it is actually we create like a, or we either use your performance uh test suite, or we actually create our own performance benchmark. And then we go back to the LM and we do a lot of work on figuring out the context for the code. So we do things like obviously, like look at the code that we're optimizing, figure out like inputs, figure out other code is going to call and all the other code context. Plus, when we execute the code, we figure out, hey, you know what, we use things like line profilers to figure out, hey, these lines are maybe the bottle like. That maybe sparks some new ideas. So we do all that and we go to the LM and say, hey, you know what, can you essentially optimize this? The LM also does not really know, it makes good guesses, and then we verify those guesses by applying it to the code and actually executing it again. And once that happens, then we like know if that works or not. And then essentially we then use that information to try to make it even better, essentially. So it goes through this like trajectory and hopefully we figure out a better way to do that.

Bob 8:11

Sounds like a complex problem to solve.

Saurabh 8:13

Uh yeah, I think it's it's it's a really hard problem.

LLMs, Exploration, And Verification Loop

Bob 8:16

Yeah, yeah. But uh, I like how you have different candidates, I like how you use the existing tests and even write new tests. Uh, I saw that all when I was running the tool, it's it's really cool. Um and then using contacts with the LLMs. Um yeah, it's it's it's fascinating. Um so do you want to um highlight any success stories?

Saurabh 8:41

Yeah, I think we've had a bunch of success. So um so we try to optimize a lot of open source code, like I think just as a way to develop the code, uh the tool itself to see, hey, how is it performing? But then we find a lot of really good optimizations. So yeah, I mean we have optimized Pydantic quite a bit, um, like the the data validation library. Also, uh we have optimized Pydantic AI and found some like big wins for that. But other than that, we this I feel like this tool is also really valuable for professional programmers working in companies. So actually, one of some of the best results we have are for like low source code bases, where we are speeding up a lot of like computer vision models, where I think what ends up happening there is that they want really fast latencies. Um, and they have to do a lot of computation to get there. So there's a lot of optimization needed. We've been speeding up a lot of like NLP pipelines, um, quantitative analysis libraries and all that. And yeah, I think so I so our goal over there is that we want all code written by professional programmers to always also be optimal. And that's like where I think at a company scale it creates a lot of value. And I think that's like where some of the biggest wins we uh we have.

Bob 9:55

I think there's also uh training value because I was running the other day and it uh it refactored some uh sorted uh building. You always go for sorted, right? Build-in, it's fast. But then it's like, yeah, but if you only care about the last N items, then use a heap, you know, heap queue. And then it uh suggested that refactoring. So like, oh cool, let's uh learn something new. All right. Well, I I I knew about heap, but it didn't occur to me in that context, right? So I see it not only like as a code improver, but also like, hey, now you're training the engineers to write more performance code. Right?

Saurabh 10:27

And I think it actually sometimes like discovers new algorithms as well. So I mean, maybe this is not completely new, but like one of the first big wins we had was optimizing Langchain. And this was like maybe more than two years ago. And it was something very similar type of a code. It was like matrix multiplication and then essentially figuring out, like essentially doing dot products between different rows of two matrices, so just like n square type of uh Cartesian product, and then figuring out the top n uh dot product itself. So and the implementation did the sorting over this whole n square uh list. And that was again uh expensive. And this guy said, Hey, don't do that. It said, Hey, you know what? If you use Quick Sort, for example, Quixort has that has this partition uh function that essentially in a linear time creates a partition where if you want the top the top n, then you can in a linear time partition so that top n, the the top, the biggest n happens first, and then the smallest, like whatever, like the rest happens later. And then you only sort the top n ones. So instead of like this like big log um m log m, which is where m is uh n square thing, you can only sort the n n log n, which makes it like a lot faster. And yeah, this was like a big uh revelation that oh, you you know this is even possible. I did not even think of this. And I think like what a lot of times what ends up happening is that core flash discovers something much better, and like we can prove that it's better, so that's good. And then and Core Flash explains why is this better, and then you I go in and read what's going on and try to understand after the fact, hey, what's exactly going on here? And that leads to a lot of like learning experiences for me.

Bob 12:21

Yeah, because the PRs are really rich, like it's not only the div, but uh there was like a nice chunk of text explaining it as well. So it is really really nice.

Wins In OSS And Enterprise Code

Saurabh 12:28

I think what we're realizing is we need like for optimization, it requires trust from users, and then um to do that, we essentially have all this rich data that we create that knows exactly what's going on during the code execution, um and everything else. So so imagine like if you if this bot, for example, is giving you new optimization, like you're naturally going to be a bit bit skeptical. Hey, is this actually going to be better? So to make you make it easier for you to accept that change, we give you all the data that you may you may need and in a very easy uh form so that you can be confident in the optimization that if you merge this in, this is actually going to speed up your code. Plus, it's also not going to break anything uh in your code base.

Bob 13:19

Yeah. But you still need the human in the loop, right? Like it's an automated tool. So you suggest it, but then the you open a PR, right? And that can be an accept or reject. Um and I guess it can sometimes get it wrong as well, right?

Saurabh 13:32

Um Yeah, I think so. Since we are modifying the source code itself, like we want uh the developers to be responsible for the source code. So yeah, I mean we try our best, we try to figure it out, uh figure out the best one. But at the end, end of the day, it's the responsibility of the programmer to like accept that change or not. Or you can even modify it if they like certain pieces of it, for example.

Bob 13:53

Yeah. And and how creative does it get in a sense, like if you see like uh pure Python being used for matrix uh operations, would you then uh bring in a library like like NumPy Pennas? Is it is is it is it okay doing that? So can because I can imagine some projects like don't want to bring in too many external dependencies, right? So in in is that allowed?

Saurabh 14:14

Yeah, I think so. What usually ends up happening is if, for example, their environment already has numpy installed, then it will suggest numpy optimizations for that. Because I mean you don't want to do like a list uh operation for uh doing matrix multiplications. You certainly want to use uh numpy um when it's possible. So yeah, I mean CodeFlash does that quite a lot. And I have realized that it can actually like do things like instead of using JSON, using ORJSON, that can run a bit a lot a lot faster, things like that. We are also like building capabilities right now to automatically convert numpy code to numba code. So the idea is that if you're doing a lot of like numpy heavy code or like numerical heavy code, then code flash can automatically convert that into Numba and actually JIT compile it so that it runs a lot faster. And it can give you that, hey, you know what, by the way, I also found this uh compiled version of the code um and it makes it 10x, for example, 10x faster. So yeah, I mean there is like a big uh search space for different type of optimizations.

Bob 15:23

Yeah, but it's a lot in the context, right? Like you you read in the context of the repo and and um suggest the optimization if it makes sense, uh depending on the product. Yeah.

Saurabh 15:31

Yeah, I think the good thing is that the good thing is that we uh verify things very rigorously. So uh when we do suggest something, we're like quite confident that hey, this is actually uh correct and it's faster. And then I think it's like sometimes it's a subjective choice as well by the programmer that hey, you know what, maybe I don't want this code to be changed. Like what I had was good enough. I don't care. So those decisions are something that the programmer can make.

Bob 15:57

Yeah.

Saurabh 15:57

Yeah.

Bob 15:58

What are some common uh optimizations you see over and over again? Uh like I like that example you had in your article the other day about deep copying, which probably uh started in innocent and then the data structure grew, and then all of a sudden there was a lot of copies happening, and that that led to a tremendous slowdown. So copying data could be one. Um yeah, what are some things you you see over and over again?

Teaching Engineers Through Optimised PRs

Saurabh 16:20

I think one thing I see is numpy being used incorrectly. Um so I think numerical operations can be quite heavy, and Python is actually quite good at with its numerical libraries, but what ends up happening is numpy has so many different operations and and like different so many different functions, and how you actually implement your um algorithm really affects the performance. And then people, for example, if you're like looping through some uh array or like in Python rather than using like a numpy function, that looping is going to be a lot slower than using the intrinsic like numpy uh function. And this is like what we see again and again where like whenever I see numpy being used, like I get happy because like CodeFlash will find a ton of optimizations there. So that's like a big one. But where I see that like whenever people are trying to implement something complicated, that leads to a lot of different ways in which you can implement it. And that gets really hard to figure out, hey, what's the best way? And I think that's like where CodeFlash has some of the best results, where it can actually discover like really, really advanced algorithms.

Bob 17:40

Yeah. Does it also lead to shorter code overall, or that's not really the case?

Saurabh 17:47

Um sometimes. So I mean, sort of context-dependent, but I think we do see shorter code sometimes. Actually, a lot of times, actually, because what ends up happening is in Python, you get more performance usually when you use the internal libraries more effectively. So instead of saying, hey, you know what, implementing your own uh logic or algorithm in Python, if you use the right library for that and the right function within that, that's going to be a lot faster. So usually like it says, hey, you know what, if you were to like re-architect your code to just use this one function, that makes your code faster. So actually using more abstract methods in Python uh that does a lot of things internally in the C Python level or in the library level, which might be compiled, that leads to a lot of performance gains. So I think that's like where we have seen code actually getting a lot shorter.

Bob 18:39

Yeah. Yeah, for example, the built-ins, right? A lot of these are are in C. So it's it's very concise functional code, but it's also like code that's already C optimized, right? Exactly. Yeah. Nice. And uh are there also cases where it says like, well, not much to do here, maybe go to a compiled language.

Saurabh 18:58

Yeah, I mean, yeah. So that's the thing where uh like you can't optimize everything, and you don't even want to optimize everything because I mean um so our approach is that we don't make really we don't really make any assumptions as in hey node, we don't have any like, for example, like a rule-based engine that hey node looks at a certain pattern as in like convert like a for loop into a list comprehension or type of thing. We don't do any of that. What we try to do is like we try to empirically prove that hey, is this actually running faster in reality across this like given um test set. So if we can prove that it is actually faster, then that becomes a valid uh like optimizations for us. So so that's and then me many times it that just does not happen, right? Because like you w the whatever the original code was doing it was like quite simple and straightforward. There's not really any better way to do it. And that happens, and that's fine. Um, and then code flash will actually say, Hey, you know what, like the speed is the same. So, hey, you know what, this is actually not performant. So, and that's fine. We don't that's like completely acceptable. Like, what we want CodeFlash to do is when it finds an optimization, it should be really high confidence and really high quality and make it really easy to accept those changes.

Bob 20:21

Yeah. Right. But uh what what do you think of the whole uh Rust uh uh not hype, but trend of a lot of tools being written in uh in Rust?

Dependencies, NumPy, Numba, And JIT Paths

Saurabh 20:30

Uh I mean that's great. I mean, like the whole Python ecosystem relies on like compiled backends. Um and yeah, Rust is amazing uh for writing really performant backends, and I think it also integrates well with Python. So I think we have looked into converting code into uh Python code into Rust. Uh automatically, we haven't really done done it uh or shipped it, but that's something in our mind. But I think like even with Python, there are a lot of like compilation options that I think not many people are aware of. So I think for numpy code, numba is one where if you write code a particular way, you can actually use LLVM to compile that code to like this compiled code. And that could speed up your code a lot if you're doing something, uh something numerical. So that's one. I think there's something called as MyPy C, where if you have your code that's MyPy strict compatible, then essentially there is like you can essentially okay. So one of the main reasons why Python can be slow is that it's dynamic. So as it's executing the code, it has to make a lot of determinations about the code it's about to run. It has to figure out. So if if there's a line, for example, C equals A plus B. So as it's executing this code, it has to understand, hey, what is A? Okay, you know what it's an object, okay, you know, what does it mean? Okay, it's uh integer, for example. Okay, cool. And then, hey, what is B? Okay, I I can do the same thing. Maybe it says it's a float. Okay, cool. Then it has to figure out, hey, what does plus mean? Hey, what's the plus between like string uh int int and a float? Okay, okay, cool. Let me now apply it. And then you store it into C, for example. Like in compiled languages, you can do it within like one or two instructions. So there's like all this overhead that happens in Python. So one of the biggest ways to speed up Python is to actually remove the bottleneck of the interpreter in a lot of ways. And that's like where compiling code or legit compiling code really helps. So that's like a lot of like the ways you can do it. And I think myPyC, for example, when you strongly type your code, then you can you can make these determinations during compile time itself. So you can know that hey, A is going to be int and B is going to be float. And then you know that like the plus operation is going to be like this one. And then you can essentially like do all everything in compile time. And during runtime, it just like runs it in the same instructions that would happen in the compiled code. So there's like a bunch of different um techniques available. And I think I think Black, the formatter, uses MyPy C to like compile the code to um, yeah, using my PyC.

Bob 23:15

Interesting. Yeah. Uh might be something not many people know, actually. Yeah, that's good to know to explore, right? Uh all the options of the language before uh venturing out. Um yeah, because there's a trade-off, right? Like um writing Rust code might not be as easy as as writing Python as well. So it's it's exactly. Yeah. So what what are some of the uh because Python is getting faster as well, right? With every release, and now you have free threading and and that kind of stuff. Uh what's uh what you're excited about that uh that Python is is working on what's coming? Performance.

Saurabh 23:49

I think I think it's interesting. I think I am always excited about Python getting faster. Um I don't think I have any particular insights for you over there. I think I think free threading is interesting because one of the biggest um I think problems has been a lack of really good native uh parallelism in Python. You essentially just have to create like new processes, usually, and that can be really heavy. So I think uh free threading helps over there. But um but yeah, I think I don't really know. I think I think one of the best ways is just using like a underlying library, for example. Using, for example, Jax or like PyTorch. They can be quite amazing at just like getting the whole um capability of the of the system. Um but yeah, I think I feel like there is so much more work work that can be done to speed up Python. Um I think maybe I think JIT is probably the most interesting thing I think that can come in the future. Because if you think about JavaScript, for example, it's kind of similar to Python, as in it's also dynamic, but it has a very strong JIT compiler that needs it to be in many cases a lot faster than uh Python for executing like just like raw raw code. And if we can implement a great JIT in Python, that would really, really help. I know there are like there are practical considerations because um like the design of Python makes it hard to jet compile for like while preserving all the existing things, but if we can do it, I think that will be one of the biggest things, uh biggest gains. I think like PyPy, for example, already does it.

When Optimisation Isn’t Worth It

Julian 25:36

Just a quick break. Let me ask you a question. How much of your last pull request did you actually write? And how much did AI write? If Copilot or ChatGPT disappeared tomorrow, would you still know how your code works and could you explain it in a code review? This is the problem we hear about the most from developers like you who reach out to us for a chat. Pyblights how Pyblights developer mindset program helps you become the developer who uses AI effectively, not the one who is completely buying. Through a one-to-one coaching, real-world projects, and proper code reviews with an expert coach, not AI, you'll actually understand the Python code that you ship. If you're tired of feeling like a prompt engineer instead of a real developer, check out and apply for PDM using the link in the description. Now back to the episode.

Bob 26:31

So back to the tool. Uh you have different integrations, right? So you can run code flash in a GitHub action, you can run it from the command line, you can run it in VS Code. Um, maybe for people starting out fresh with this tool. I mean, I just UVX installed it. UVX CodeFlash and you in it and you can get running right. And you might need a token on GitHub uh in your repo for the action, but that's about it, right? To get started.

Python Speed, JIT Dreams, And Tooling

Saurabh 26:57

Yeah, so I think there's actually like two ways, two primary ways of using code flash. One is to optimize all your existing code. So if you already have a project, you can run code flash dash dash all on it. Or if you have a workload that says python benchmark.vpy, for example, and does a few things. You can run it as code flash optimize benchmark.py. And it will go in, run your uh script, trace it, and figure out, hey, you know what, here are the problems that I see, and then actually ensure correctness over um over that script, uh that executable, and also like find formus optimizations for them. So there's a couple of ways to optimize existing code. And I think one of the things that that we want to make happen is to make all new code that people write always be optimal. Because according to me, the root cause of software being slow is that it's expensive and hard to do optimization as you're writing your software. So, what again, like this is like what ends up happening where you're trying to rush a sh rush a feature, trying to implement something quickly, and then you it's hard to figure out, hey, what's the best way of doing it? So you essentially just like don't do it. And yeah, people call it as a premature optimizations root of all evil, but that's a misquote. That's actually not what he said. But um, but people like say all that, right? But because the reason is that it just takes attention and effort to do optimize code. So what we want to happen is when CodeFlash automates performance optimization, then all new code that people write can always be optimal, like from the first implementation and the first attempt itself. So you don't have to go back to it later when you like discover a bottleneck in production. So that's like what we try to do. So we call it continuous optimization, like continuous deployment, like the same idea with like continuous optimization. So, and I think that's like the primary way of using code flash, where you can install CodeFlash as a VS Code extension or like a GitHub action. So, for example, I think GitHub action is particularly really useful where you can, if you're working on a project, you create a pull request. Um, code flash will run as a GitHub action within the pull request itself. Look at hey, all the new code you wrote, and then figure out, hey, try to figure out, hey, was there a better way to do this same thing? And we'll try to optimize it in the background. And then if it finds an optimization, then it's it creates like a either dependent PR or like an inline suggestion saying, hey, you know what, look at this change. This change, if you accept it, will make your code this much faster. You can quickly browse through it. If you like it, it's accepted. And essentially, even before you ship your code or merge to main, your code is always optimal. And I think that's like a super power uh or a like uh that I would like to enable in the world.

Bob 30:04

Yeah. Yeah, maybe because it takes a little bit, right? So it might be more convenient to run it in GitHub Action uh you would have to run it in a separate terminal window, maybe. I mean the VS Code integration, how would that work? So how is that different?

Saurabh 30:19

I think it's sort of sort of similar. Um it also um optimizes new code you're writing. So I think in the VS Code one, you can in the editor window itself, like above the function name, click on optimize tool, uh uh optimize like uh text over there. When you click on it, it in the background sponsor a code flash process that tries to optimize it. And if you find optimization, then we say, hey no, look at this thing, like a red step for you to accept. So that's like one way, or or you can also just like try to optimize um the new code you're written in a new commit. So that way, like all your new code can also be optimal. Like so this is can be an alternative to the GitHub action, but then since things are running locally in your machine, um, you can interact with it in a lot better way. And then um you can just accept code, change your code, it gives you a lot more flexibility.

Bob 31:15

Yeah. But it has to be a Python project, right? No, no standalone script. So you need to have a PyProject.toml. Ideally, tests, but not required. Um yeah, that does.

PDM Coaching Break: Use AI Well

Saurabh 31:25

Yeah, right now I think we do require a PyProject.toml. And it just like asks for your uh for hey, where are your tests located if you have any? Or hey, where is your code? What's your module route, for example, or if you use a formatter, for example. So we want to make sure that we also format the code the same way. So we just ask for a few simple configuration settings over there, and then yeah, then essentially you just have set it up. I think it takes two minutes to set it up, like quite simple. And then you can start nice optimizing.

Bob 31:56

Nice menu. Yeah, yeah, yeah. Cool, cool. So um back to some of the challenges uh when while uh building this. Uh I mean, one we uh mentioned before false positives. Uh I'll just list them out what I thought, but you can you can fill me in, right? Like false positives. Um, I saw I forgot the name, but you do some special type of uh testing, uh not cyclomatic. Uh you know what I mean, probably. And then also sorry?

Saurabh 32:24

Concolic testing.

Getting Started: CLI, VS Code, Actions

Bob 32:25

Yes, concolic, yeah, exactly. That one. So I'm I'm curious about um why that type of testing was needed, if that was related to one of the challenges, and then benchmarking as well, right? That could be a challenge as well, because how do you um you know how do you emulate a similar system, right? Or there might be side effects. So those I can imagine being challenges, but I'm happy for you to to yeah list out any others or just uh detail these. Yeah.

Saurabh 32:52

Yeah, I mean I think I think these are great points actually. So so optimization is a problem, essentially, like hasn't been solved really well because of those challenges. So I think correctness is actually a big one because that's like the fundamental problem over here to solve. Um, because if you think about compilers and the way they optimize, they essentially give you the guarantee that the new code that's being compiled, uh optimized to is also has the same behavior as before. But if you change the source code itself, you don't really get those guarantees. Um because yeah, it may just be doing something different in some edge case. So that gets hard to determine. So I think we actually spend a lot of time and effort from our side to test the code for correctness, and we have like multiple different ways to do correctness uh testing. So like our philosophy is to um we essentially do regression testing. So we look at the original code, execute that code with a lot of different inputs, and then essentially we observe the code as it executes to see, hey, what is the behavior of this code? So, for example, like a basic way of testing could be for a given input to a function, hey, what are the return values? But then it can do other things as well. It can, for example, mutate inputs themselves. And then we want to know, hey, is it mutating the inputs the same way? It can return an exception, it can like have side effects over there. So what we do is like we, as we execute the code, we understand and analyze all these different modalities of uh the function behavior and try to ensure that um the optimization we're proposing has the same behavior across all those modalities. And and now the problem becomes an input generation problem. So what we do is for that, we if you have any any existing tests, we rerun them and make sure that they are actually correct over all those cases. But existing tests can are rarely enough to prove correctness. First of all, they don't exist most of the times, but even when they do, they may just be testing for a few conditions. So we rely on LLMs to create a really wide variety of tests, like a diverse set of tests that test for like a lot of base cases, edge cases, like some like larger cases as well, to like know what's going on. For each of them, we check for hey, is this actually the same behavior across the original code and the optimization we are pro uh we are uh proposing? And even if like the change is different for a single input, then it's not a correct optimization. So we discard that.

unknown 35:46

Yeah.

Continuous Optimisation On Every PR

Saurabh 35:47

So yeah, we do a bunch of that. We have some tracing. I think we have concolic testing as well. And I can maybe talk a bit about concolic testing. So um concolic test is like a hybrid formal verification way for um verifying correctness. So I think concolic essentially is like a acronym, not acronym, but like an abbreviation for uh concrete symbolic testing. Right. So I mean you can think of formal verification. One of the ways of formal verification is think of it as, hey, you know what, you have two mathematical functions. It's not Python code anymore. It's like two mathematical functions. Can you, through some mathematical transformations, prove that, hey, this one function is the same as some other function? Uh these two functions are equivalent, right? So you can think of that as a starting point for formal verification, as in, hey, you know what, like this is like what we did in like when we were in like high school something, right? Like prove that this function like reduces it to that function. That's a similar idea over there. But things get, and I think you can actually do this for strongly typed languages where things are a lot more deterministic, and you're doing it over essentially same type of similar type of a mathematical function in the code. So you can do that, but in Python it gets tricky because the second dynamic. So your input types may not be rigorously defined, and things can get strange in Python. Um, and and when you execute a function, it could be doing a lot of things internally. It may have like edge cases, so the behavior may not be so deterministic. So what concolit testing does is it's a combination of symbolic execution. So essentially, like knowing instead of instead of actually executing the code, it's under trying to map the relationships between different statements and then executing the code as well to know hey, what are the values of these different variables as it executes. So it's a mixture of both. So what it does, okay, so the way we solve it is we try to maximize coverage over this code. So what we want to do is generate inputs to run this function so that the coverage over this function is maximized. So that's essentially the problem that this concolic testing is trying to solve. And it does it through like through like using something called as like SMT verification, SMT solving, which uses Z3 solvers. So there are some internals in there. But essentially what it tries to do is try to tries to maximize the coverage over this code and tries to figure out, hey, what inputs we have that can maximize coverage. So I think I would just like maybe show this one point where suppose there's a if condition in the code. Let's say you say X is more than three, then take this branch, otherwise take this other branch. Yeah. And suppose like you execute this code and X is actually five in here. Right? So you can think of it as hey, what was the ins what was the input of the function that led to this internal node being five? Then the question becomes, hey, how do I change my input to this function so that this internal node x is actually less than three? So that I take this else branch. So that leads to a series of sort of like equations, I would say. And then you solve that equation to figure out, hey, if I were to take this other else branch, what should the input be so that I end up taking this branch? So that's the sort of the way it solves for this.

Bob 39:24

This this uh reminds me of some of these uh advent of codes, right? There uh you need to use the solver to to work backwards to the the initial value, right, in in some sort of chain. Um so yeah, okay, that's that's a clear example, right? So not only like test the the X greater than three and X less than three to test your if and L's, and then you would have 100% coverage, strictly speaking. There's also like going one step back, like what can trigger those two conditions, and then so it just all comes down to more robust testing so that your optimizations are yeah, fair. In a more reliable way, right? Exactly.

Saurabh 40:03

And I think like CodeFlash uses a coverage testing tool as well to ensure that like the inputs or the testing we are doing actually covers most of the branches in the code. So we can be more confident that hey, we don't we're not really missing any edge cases here.

Bob 40:18

Yeah. Awesome. Oh, we can talk hours uh about this. This is fascinating, but uh we have we have to wrap it up a little a bit. Um but yeah, last question then is uh this is open source, right? And uh we're grateful for that because it's a great tool to add to your chain. Um but you're also turning this into a business, right? Um so you want to talk a bit about that and and how you balance the two. So what's the business model and how can you balance it with open source?

Project Requirements And Setup

Correctness, Concolic Testing, And Coverage

Saurabh 40:43

Yeah, so um so the client is source available, but I think the client still talks to the R backend to generate the optimizations and actually integrate with the GitHub and everything. So um so so that's like one part, but but I think we actually have a very generous free tier that allows access to a lot of the LMs that we use. And I mean you can essentially if you're like um working on some small programs or like some small projects, you can use it almost for free. So and actually we want that. We want people to write optimal code, so we want to encourage that. But like the way uh we are also business, um, the way we that works is we sell to like bigger companies that actually see a lot of value from writing optimized code. Because you think about it, like at a scale that these companies work at, they have so much Python code, and even if you can speed up some small thing, they're running it at such a scale that the costs really add up, they can save a lot of money. Plus, like they're late, they also really like lower latencies because that results in a much better user experience and and in various different ways. If you're running an analysis, if you can run that analysis a lot faster, you can go through a lot more analysis, or like you can iterate through that a lot faster. So that's great. Or if you have an ML model, for example, right? Um, your users want that to be really fast. And if you don't make it fast, then they will probably choose your competitors. So there's always this like um like competition to have a really fast product. And right now, there isn't really a great option for these companies, other than actually asking some really senior people to go and optimize it. And that's expensive. So if they can use Code Flash, they get all these benefits of lower costs, like reduced latencies, and like faster processing and everything. Plus, you don't really have to ask your engineers to do all the optimization all the time. And they can work on maybe other problems itself. So it's a big win for these companies. And I think that's like where we uh charge them for uh access to Code Flash and we help them like optimize as well.

Bob 43:03

Yeah, and that that's a nice uh business case, right? Because uh faster solutions that can be the difference between winning or losing a customer. Happy developers, they get a kick out of uh you know writing fast code. Um and uh yeah, compute power, right? Like uh if your code is efficient, you will pay less to cloud providers because you need less compute power, right? So I mean optimization is just good. Yeah, well, and it's it's hard as well, right? Like UV and and rough. I mean these these codes are so impressive because it's a speed, you know. The the fact that you can now run rough with a shortcut in Vim, that's of course very appealing, right? So uh yeah, exactly.

Saurabh 43:47

I think what you want is like our goal is to figure out hey, how do we make every code out there always be optimal? Yeah, so that's what we are uh working on.

Bob 43:56

Yeah, nice. No, it's uh it's it's amazing because yeah, if there's an LM involved, there will also be some cost involved, even for a free T user. So it's yeah, it's it's impressive how you you can balance that. So uh and uh thanks for uh giving us this tool. Yeah, it's uh it's really cool. Um I mean for sure. Uh do we have any uh final uh uh shout out or uh um word of advice or recommendation for audience before we wrap?

Saurabh 44:26

I think um I think it's like related to Code Flash itself, but like like I think since I started using CodeFlash uh I've realized writing like there's like so many times we write slow code and it's just impractical to like figure out if this can be made faster. And I think after like we start using Code Flash internally as well, it really changes the way we think about writing fast code. And I think like the future of writing fast code is that it will get automated where these machines are like these programs we are building can do so much more work to find the best one that what we see that will what we see happening right now is that people use AI agents and all those sort of things to come up with the first working version of the code. And then yeah, then things like Code Flash figures out the most performant version for that. So I would like yeah, encourage people to try out Code Flash for free and then see like how their workflow changes, where you can like quickly write new code with AI agents um and coding agents, and then also have it all be really optimal as well. And what we've seen is like when when people use coding agents, the code they write can be slower than like if they were writing it expertly. Um but then with code flash you can get both, the best of the both worlds.

Bob 45:56

Yeah. Awesome. And again, it's a great learning tool. So if you use it on your PRs, right, you're gonna learn. Uh, nonetheless, if if um apart from using your tool, which which we all should, um, is there any um other resource you can recommend for people that want to really learn how to write faster Python? Uh what helped you apart from you know learning from your own tool, which is wicked. Uh, but yeah, what uh resource also helped you?

Saurabh 46:23

Um I think it's hard to say. I mean, like honestly, I think CodeFlash helps because it's like in-context learning. But other than that, I think the um just like fundamentals of data structures helps a lot. Yeah, understanding like how the language works helps a lot. Um, why is it interpreted interpreted? Like, why does NumPy help, for example, then overwriting like a uh like a Python list type of implementation? Those sort of things helps quite a lot. And actually just understanding how like processes work, how does memory hierarchy work and all those the fundamentals I think really help in writing fast code? And like once you understand that, then all the code that you write, you always have that in the back of your mind as in, hey, is this like the right way to do it? And I think that leads to like really high performance code in the in the in the future.

Bob 47:15

Yeah, great advice. Yeah, going back to to the fundamentals and and really trying to understand how things underlying exactly underlying mechanics work.

Saurabh 47:23

Yeah.

unknown 47:24

Yeah.

Saurabh 47:24

And yeah, at the end of the day, you when you're optimizing something, you have some benchmark to try and repeat. So yeah, then you essentially just like try out ideas, see if this is faster. You like use a profiler um to like figure out, hey, where should I focus my attention? Um, so yeah, that's all all those things help quite a lot.

Bob 47:44

Uh that's an important point, actually, right? Because sometimes we we sense that something is slow, but uh until we start profiling and really looking where the calls are happening, uh, you might be optimizing the wrong thing as well, right?

Saurabh 47:55

Yeah, for sure. I mean, so that's that's true. And there's a lot of gotchas as well, as in if you're running a profiler, um you're assuming um how the code executes in production. And sometimes you may have that information, but I think production can be quite um, like the distribution of inputs can be quite extensive. But when you create a benchmark, you're essentially saying, hey, for this is like the this is the input I care about. And that could help, but you may essentially you may also end up regressing other inputs as well that you did not think of. So it can be like more complicated. And I think this is like the type of problems we think of, think of at code flash. So the way we do that at code flash is essentially we test performance across a distribution of inputs, test across all of them, and then know that hey, this input is faster by this much person. That input may be a bit slower as well. So we like do all that determination to figure out the best one. So yeah, I mean, it's it's it's it's uh it can get complicated, but I think it's it's like a fun exercise.

Benchmarking Across Input Distributions

Bob 48:58

Yeah. Nah, fascinating work. And and thanks for uh sharing today on our on our podcast. I think uh people will uh will learn a lot and uh hope they're gonna check this out. Uh I've been using it for a couple of PRs and uh yeah, it's really cool. So uh thank you. Thank you for inviting me. Sarap for uh joining today and uh yeah for the work you do. Thank you. All right, cheers.

Julian 49:21

Hey everyone, thanks for tuning into the Pivolites Podcast. I really hope you enjoyed it. A quick message from me and Bob before you go is to get the most out of your experience with Pivelites, including learning more Python, engaging with other developers, learning about our guests, discussing these podcast episodes, and much, much more. Please join our community at pivolites.circle.so. The link is on the screen if you're watching this on YouTube, and it's in the show notes for everyone else. When you join, make sure you introduce yourself, engage with myself and Bob, and the many other developers in the community. It's one of the greatest things you can do to expand your knowledge and reach and network as a Python developer. We'll see you in the next episode, and we will see you in the community.

Bob Belderbos

Host

Julian Sequeira

Host