Pybites Podcast

#215: Arthur Pastel on creating actionable optimisations with CodSpeed

Julian Sequeira & Bob Belderbos Episode 215

In this episode, Bob sits down with Arthur, a Python engineer based in France and the creator of CodSpeed, to dig into a problem many teams don’t notice until it’s too late: performance regressions. Arthur shares the story behind building CodSpeed, starting with real-world pain points from robotics and machine-learning pipelines where small slowdowns quietly piled up and broke systems in production. 

We discuss how CodSpeed fits into everyday developer workflows and why treating performance checks like tests or coverage changes how teams ship code. Arthur also shares how open source shaped the product’s mission, the surprising environmental impact of avoiding tiny regressions at scale, and why AI-driven coding makes performance guardrails more important than ever.

Reach out to Arthur on LinkedIn: https://www.linkedin.com/in/arthurpastel/ 

Arthur's website: https://apas.tel/

Check out Codspeed: https://codspeed.io/ 

Github: https://github.com/CodSpeedHQ/codspeed

___

💡🧑‍💻 Want to become a more focused, motivated, and effective Python developer? The Pybites Developer Mindset (PDM) Program helps you build strong habits, deepen your skills, and make real progress in just six weeks. Join us for mentorship, accountability, and a supportive community that keeps you moving forward. Start your PDM journey now! 🌟✅ https://pybit.es/catalogue/the-pdm-program/

___

If you found this podcast helpful, please consider following us!
Start Here with Pybites: https://pybit.es

Arthur:

In the beginning the goal was to make any kind of software faster, but uh since we started with Pydentic, it also created a strong uh open source DNA in uh in the product uh that we want to bring essentially the product to open source products to make them benefit of uh uh improving their performance and avoiding regressions.

Julian:

Hello and welcome to the Panabites Podcast where we talk about Python, career, and mindset. We are your hosts, I'm Julian Siclera, and I am Bob Buildables.

Bob:

If you're looking to improve your Python, your career, and learn the mindset for success. This is the podcast for you. Let's get started. Hello and welcome back everybody to the Panabites Podcast. I'm Bob Buildables. I'm here with uh Arthur Pastel. Arthur, welcome to the show.

Arthur:

Thanks, thanks for having me here.

unknown:

Hi.

Arthur:

Great, great. Uh we are coming off uh a lunch week, so a lot of things happened uh last week. But yeah, really excited to be here.

Bob:

Oh, right. Yeah, I wanna definitely hear about that. Uh, because we're going to cover a lot of uh CodSpeed, your uh your benchmarking tool. Um but before doing that, uh maybe you wanna give us a quick intro, who you are, what you do, and then we dive into uh the cool tool uh you've built.

Arthur:

Yes. Uh so I'm uh Arthur, I'm a software engineer based in France, and uh yeah, I've spent a lot of time building in Python. Uh started with uh uh like building APIs uh as part of my uh my my my first uh first jobs and then also started with the open source, built uh an object document mapper, like some kind of RRM, but for MongoDB. And yeah, then uh got started uh building CodSpeed, so really focusing on software performance, benchmarking, and essentially measuring performance uh early in the development cycle, so kind of like a test framework for unit tests or integration tests, but for performance and for like avoiding uh uh big mess in production uh when you didn't expect them.

Bob:

Yeah, cool. And um, yeah, actually, you built that ORM as well. I saw that on your uh GitHub, and uh square some stars. Um, so do you want to touch up on that as well a bit while why you built that one?

Arthur:

Yeah, yeah, definitely. Uh so it's uh demantic. Uh and so in the in the beginning it was I was using MongoDB, so in in the same job I mentioned before, and uh I got started uh using uh I mean I transitioned from like Flask to Fast API and using a lot of type Python using Pydentic and essentially discover discovering the power of like um uh the typing system in in Python. And uh and so from there I was like, okay, I want this, but for my database access, and uh I saw uh I used a bit Django before, and I saw that essentially we could leverage Pydentic to use it as well for database database access. And so yeah, got started with it, uh uh build it, uh got some people starting to use it, and uh Tiangolo from uh Sebastian Ramirez from Fast API was pretty excited about it as well. So uh gave me a hand and uh Noah as well is uh he's helping me maintain it because I don't have that much time to spend on it uh anymore, even if I hope I will be able to work on its performance to make it even faster, like combine my two passions, MongoDB and uh performance. Maybe not MongoDB, it's maybe not that a passion, but uh yeah, essentially working on performance of this project. But yeah, this like it's really discovering uh Pydentic and everything that uh made me want to build something like this. And I think Sebastian Ramirez built the same thing for Postgres and um and uh MySQL as well. Uh it's I think it's called SQL SQL model or something, and uh yeah, essentially it's uh pretty much the same, and uh the the experience is uh is uh really good as well.

Bob:

Yeah, yeah, we love SQL model. We just uh launched a learning path on the platform about that. Uh maybe I should now do another learning path on YouTube. So the difference is um uh that yours is more focused on uh MongoDB, right?

Arthur:

And uh yeah, yeah, this is the I mean probably it's not the only difference, but really uh Audimon TikTok was focused on uh document-oriented databases. So in the beginning I wanted to do all document databases, but uh it's a for now it's just MongoDB, but uh the idea is that uh should you should be able to use it as well uh with uh other document databases like the NamoDB and other stuff, but uh later on.

Bob:

Yeah, last question on that. Uh because I'm more like a relational database, you know, that's my mental model. So when when would I use a document database? What's what's uh best usage?

Arthur:

Um to me, like the biggest benefit, it was it was mostly before. Uh the biggest benefit was not to have a schema, which made it possible to. I mean, it's it's kind of weird to say not have a schema, and then uh you write a Pydentic model that's actually the schema. Yeah, exactly. But uh you don't have really have to write migrations and uh you can kind of be really flexible about what data you store, having as well embedded documents. So I think for prototyping it's really helpful. And as like, for example, for CodSpeed, so the the project uh I've been working on for the last uh three three years. Uh, we started with uh MongoDB mostly because of this, because in the beginning we didn't know like uh what exactly would be the data relations and how we would store it. And it gave us really like this flexibility into saying, like, uh, okay, we'll just change the the way we access the data and everything. And you're not really tied to yeah, schemas and everything, right? Even if yeah, it's mostly flexibility and as well being able to embed documents, uh, which is really helpful. So, for example, in like when we store the results, we also store some additional data related specifically to uh one instrument that measures performance or memory or or whatever, and it's it's helpful for this. But yeah, in the end, when we are you have schema, it's it's kind of close in the end, and as well, even the capabilities like um in the beginning MongoDB was not that great as uh like to build relational queries and like joint joining tables and everything. But now it's like if if you know how to query uh the database, it's essentially the same in terms of performance, except the query language is uh pretty ugly. That's why also I built uh I built uh demantic uh to make it nicer.

Bob:

Yeah, interesting. Yeah, thanks for uh clarifying. But uh yeah, let's uh let's move on to uh CodSpeed, your more recent uh project endeavor. Um, so you said you're already working on performance and benchmarking. Um, but was there a particular trigger where you said, like, wow, we need we really need to have this tool, right? So what what um yeah, where did it all start?

Arthur:

Uh it started in that same at that same moment where when I built uh Odemantic, and essentially I was building uh a robotic application in Python. So the the goal was to run um machine learning pipelines on robots, and uh it's it was pretty a pretty funny, uh funny product. I mean it was it was for work, but uh it was uh like uh we had some robots that would go in supermarkets and that would take pictures of uh shelves to do like automated inventory and uh detect when actually you need to put back a product so that people can actually buy it. And so this was pretty cool. And we did some uh embedded AI to actually do this analysis and everything. And we had some really big constraints on the data we could store because essentially it was a pipeline that when we like we acquired a lot of images, we had to process them fast enough so that they would not accumulate, and then when we processed them, we we would essentially just get uh some kind of JSON data saying which products are in the shelves or not. And multiple times we had some issues where we deployed some new versions of the AI algorithm that would like crop the image, uh do some uh classification and everything, but they would get really slow, which would completely break like the machine learning pipelines. And at that moment, uh I tried to search for some kind of automated benchmarking tools and everything, and I didn't find anything. I mean, I I found some benchmarking uh uh frameworks like uh Py test benchmarks and and and a lot of others, like GitHub Action Benchmark benchmarks, but it was really flaky when I tried to set it up, and uh we ended up not being able to use it, and so we created a fake supermarket in the in the office, and it was also pretty fun. But uh, if we like avoiding it would have been uh would have been better. And yes, this was the the first time, and it really I wanted to have uh such a product, and then um over the course of uh of my small developer career, I still had this issue uh starting uh building in Rust and everything. And later I uh saw that Samuel Colvin from uh Pydentic um was rewriting uh Pydentik in Rust, so essentially building Pydentik V2. So this was probably like three years ago, and um he was exactly looking for the same thing uh I wanted uh at that moment, uh, which was uh to be able to measure the performance in the CI environment of uh Pydentik. Uh and so I just reached out I said okay, uh yeah, I had the same problem, it's really annoying, and uh I'm gonna try to find a solution, and uh I'd be happy to have your feedback and uh to build uh from this. And so they became the first users, and yeah, yeah, that was pretty cool.

Bob:

So yeah, yeah, yeah. Awesome, because yeah, we uh we have our PyTest suite and uh we look for regressions, but uh we might not be mindful of the performance, right? We uh that's it's less common maybe to constantly measure that as opposed to just uh plain Py test um uh test failures. Um yeah, so yeah, I I I played with it this morning because I wanted to know a bit how it works, uh, really nice, right? You make an account and then uh you can just run it as a command line tool. Um, but you can also found that the interesting part to hook it up in GitHub Actions and then have it just run against every um pull request commit, whatever, right? And uh of course I did a contrived example with uh cloud, right? That we had some naive code and then uh some some major speed up, and then you get like a note uh in the PR um saying that. So um yeah, that that's my experience so far. So but maybe you want to highlight um a bit more how it works, and then we can look at uh some of the features and what set what sets it apart.

Arthur:

Yeah, yeah, definitely. So, yeah, pretty much like the experience you described is pretty much like uh uh what we want to happen. Uh like the being able to run the benchmarks locally is really important because like uh uh optimizing performance is really a lot about the performance feedback loop. And so as well, this is why we're building CodSpeed, because uh so first CodSpeed you can have it in your CI, which shortens a bit the performance feedback loop compared to like uh if you're running uh uh and you're just measuring performance in production using existing tools. But actually, you can even shorten it, uh have it even shorter when you're measuring performance locally, so you can iterate very quickly very quickly, or as well have an agent iterate very quickly uh to optimize the performance, and then you can release it. And so, yeah, the the main goal is to be able to have some signals because usually the performance regressions will happen when you don't expect them to happen. And so having something in CI is just kind of like you need tests, uh it's like you put them there, and you know that when something's gonna happen, uh at least you're you're gonna get a warning and you will be able to check. And so that's mostly the goal uh of for CI integration is to let you know that okay, actually uh the resource consumption for this uh new commit is creating a lot of issues, and it might affect your production state and uh uh create some issues for your users and everything, uh, or as well, it it might just slow down uh the product for your users, which it it might not be a significant issue, but over time those small regressions of performance compound and it makes something really like uh uh slow in the end and uh not easy to use. And as well, this gives you an opportunity to track those moments and those commits where you introduce some performance regressions so you can know that actually when I introduced this new feature, it also degraded the performance. So you have it gives you some clues on uh where to optimize.

Bob:

Yeah, super useful. Yeah, uh and and uh Pydentic ended up adopting it as well.

Arthur:

And and uh yes, uh so yeah, uh Pydentic and Samuel Colonna were the convinced they were the first users, uh, so we are really excited, and uh they still use it today, even in new projects, and uh so we are pretty pretty exciting. We we really like uh in the beginning the goal was to to make uh any kind of software faster, but uh uh since we started with Pydentic, it also created a strong uh open source DNA in uh in the product, uh that we want to bring essentially the product to open source project to make them benefit of uh uh improving their performance and avoiding regressions because like we I mean I've been building on open source software, the on open source software since the beginning, and like it's really helpful. And uh the contribution we want to have with CodSpeed is to make sure that all open source software can get faster or at least avoid performance regressions in uh open source software, and so yeah, this is a a key uh uh part of the mission of the product uh to be able to help people uh write code that doesn't get slower and possibly gets even faster. And yeah, uh to it. I mean, today we already have a small impact because uh, like for example, Pydentic is used like in FastAPI, which is used in many places, and even just Pydentic is used uh really in a lot of different places, which is really cool because as well it compounds and uh one percent of performance regression avoided in uh the validation part of Pydentic, it creates uh a lot of uh like uh energy savings and a lot of things, and uh it's uh it's also what drives me to continue working on on CodSpeed, is that uh the project uh is having a really uh positive impact uh on the environment, uh, energy consumption and everything, which is uh uh pretty cool. And I was looking for this uh when looking for a project.

Bob:

Yeah, uh that's a big impact. We got yeah, because Pydentic is used in so many other libraries. So if you have a fix there, uh then it's going to ripple through to so many projects, right? So that that's really cool. Yeah. So um a bit about the open source uh model. Um I so it's it's free for public repos, I saw, and then for private ones, uh, because I had a private repo and then didn't work, right? Because then you need to upgrade. Um, but yeah, like the um so maybe we can talk a bit about that business model and also how um yeah, the um um the the open source component, how it has shaped the roadmap of the product.

Arthur:

Yes, um, so the essentially the project like the using CodSpeed is completely free for open source, uh like without any limitations. Then if you have a private product, or idea is that uh we want to price the product for companies using it, and so if you have a team of more like or more than five people, then you have to pay for it. So normally it should have worked uh for your product if you were alone on it, but uh maybe this is a small bug on our side. Um, and then uh yeah, the idea is that the core of like the measurement part of CodSpeed, so the it's a CodSpeed CLI, the runner, essentially what brings the CPU simulation or other instrument instruments that will allow you to extract uh performance insights of your of your code. Uh it's this is completely open source, and the part that is closed source is actually the front end for visualizing it, uh generating flame graphs and everything. Um and the the goal uh is still for it to be 100% free for open source and uh completely available. Uh so yeah, that's mostly it. And then uh yeah, essentially we stack different instruments uh in this runner, that's uh the the open source part I mean I mentioned. And for example, we can measure how much memory is consumed, we can as well simulate the execution of the CPU. So this is really our big component and what uh solves really the problem because I mentioned that it was hard to measure performance uh before, and especially like uh how long it takes to run software. But the reason it's really hard is because you have a lot of variance when you're you go, for example, in uh if you you try to measure performance and you just uh use time in uh or time it uh in Python and you try to put this in your CI environment uh on GitHub Actions, for example, which is also free, which is pretty cool. But you will see that from one moment to another, uh you have a lot of spikes, unexpected spikes, which creates a lot of issues because then the threshold will not be the right one. And uh yeah, this is mostly due to the fact that uh those are virtual machines and GitHub, since it's free, they're really like over provisioning uh the hardware uh physical machines, uh, so it's cheaper for them. But the downside is that if someone is mining Bitcoin or whatever uh crypto uh next to next to your container or next to your VM, then you will have a lot of uh unwanted noise in your measurement. And uh so this is also something we spent a lot of time uh focusing on.

Julian:

Just a quick break. Let me ask you a question. How much of your last pull request did you actually write? And how much did AI write? If Copilot or ChatGPT disappeared tomorrow, would you still know how your code works and could you explain it in a code review? This is the problem we hear about the most from developers like you who reach out to us for a chat. And Pybonnets, how Pyblight's developer mindset program helps you become the developer who uses AMA effectively, not the one who is completely propped up by it. Through a one-to-one coaching, real-world projects, and proper code reviews with an expert coach, not AI, you'll actually understand the Python code that you ship. If you're tired of feeling like a prompt engineer instead of a real developer, check out and apply for PDM using the link in the description. Now back to the episode.

Bob:

Yeah, yeah, that's my next question. Like I had a sour ebb on on the podcast, uh he uh made uh you know also performance tool, right? Code slash. Um and we were talking about uh the typical challenges of benchmarking, right? Which you also just uh described. So um, yeah, and one of the uh selling points uh in the README, it's uh the CPU simulation, right? And how you got it uh to less than one percent uh variance. Um and that's kind of the critical part why this this you know works, because you are we're able to eliminate that noise. So can you uh can you talk a bit about how how you accomplish that?

Arthur:

Yeah, yeah, definitely. Um so as I mentioned before, one of the big drawbacks is that like the the essentially what creates a lot of noise is when you have multiple uh virtual machines, uh you share the same physical machine. And so even if like on those multiple virtual machines the data is perfectly isolated, so you cannot like go from one machine, one virtual machine to another, still the hardware is shared. And for example, some some of the caches of the CPU, like L1, so the really the fastest caches are really dedicated to your virtual machine, which is perfectly fine. But when you go a bit higher in the memory access, so the the cache structure usually is like uh with three levels, so L1, L2, L3, and the higher the level, the bigger the cache. And the last level of caches are shared among virtual machines. And this is exactly what is creating the uh unwanted noise. Because uh what happens is if your like your neighbor is mining bitcoin, uh you you are accessing your array like uh a bunch of times to, for example, uh sort or do like I don't know, yeah, a sorting algorithm. But since your your neighbor is uh mining bitcoin, is uh uh 100% of the time is emptying and evicting all the data in the cache, which create cache misses for you. So every time you have to access the memory, and essentially this memory access patterns is what is creating a lot of the noise. And our approach and uh the approach with simulation, the idea is to say, okay, we're running in a perfectly isolated environment. So for each instruction is executed, but uh we simulate how the data is fetched or written to memory uh by simulating the caches, which means even if like you're running in an environment where someone is 100% of the time is uh emptying the caches, uh, it's completely fine because anyway they are simulated, and the data we collect is just how much uh fake or simulated uh cache misses or cache hits uh happened, and this helps a lot containing the variance and reducing it uh down to one percent. Uh and this one percent actually is pretty interesting because uh what we found is that now uh the variance we still have within this one percent one percent is actually inherent variance of the code. So, for example, uh when your your Python code uses CPython or or your interpreter of choice to allocate memory, memory allocators are often non-determinist non-deterministic uh for various reasons, which means that we we measure performance in a deterministic way, but the code is not deterministic by itself because there are some some a lot of uh variants in there. And as well, for example, if you allocate a hash map or something, uh like a let's say a dictionary, uh it will also rely on uh random parameters in uh like the hashing algorithm and everything, and this creates a lot of non-deterministic parameters. So this is what like the one person variance uh is uh and where it's coming from.

Bob:

Yeah, so it's almost like you you have you have this extra virtual layer or this shield around it that protects the environment, right?

Arthur:

Yeah, yeah, definitely.

Bob:

How did you figure that out? A bit meta, right? Like did you do benchmarking yourself to figure that out, or where how did you what was that aha moment like? Like, oh, well, this is how I'm going to solve it. Because people listening to that, right? Like, and they might want to create a library and and you know build some distinguishing project. Uh, I'm always like curious, like, how did they how did Pydentic find out they needed Rust? How did Fast API come over there that they need Starlet and Pydentic, right? Like, what was that aha moment with this?

Arthur:

I think it was really about, and it's also about because like this simulation approach also has some limitations. Happy to talk about it just after. But um, it was really about like, okay, we essentially we build, like, we know how to build a CPU from like from zero, we know how to build everything, but we don't know what happens when we run run code. And I was this was kind of driving me mad because I was like, there is no way we don't have a way to like uh anticipate how much uh computing power will be needed to execute a fixed piece of code. Like we created Python, we know exactly how it works, we we have written interpreters, we know exactly in what it compiles, like the assembly, the binary, and everything. We created the CPU, we know exactly how it works, almost exactly how it works. But we can't like piece everything together and say, okay, this piece of code, if I put it in this program uh at this point of time, it will take exactly X seconds to execute. We and not having this solution kind of drove me mad. And so thinking about simulation and that actually maybe we can try to create some approximation, help approximations helped. Uh, and as well, I I need to find it, but uh I also bumped into an article mentioning uh Valgrind, which is essentially the low-level layer we we are using for the simulation, and cache grind, and it's also was an eye-opener because this was like okay, so we can simulate how the memory is accessed. Most of the issues are around memory, so if I piece those together, essentially it should work. And uh, so this was the the first uh first result. Uh cool. And uh yeah, we iterated uh from there.

Bob:

Yeah, I'm going to ask you that resource after uh after our chat, and then we can link it as well. Yeah, yeah, yeah. Uh and how was it then to uh expand from Python to Rust and and and different languages that deal with memory in different ways? Was that challenging? Um, how did you solve that?

Arthur:

Actually, Python is one of the hardest languages we integrate with. Uh, just I mean, after uh Node.js, mostly because like uh we also support Node.js, but Node.js in the interpreter, it has a a lot of uh just in time compiling. Uh so the JIT essentially what we are getting in Python, and I think it it's stable and uh it's getting even better to make actually your code even faster. And this is really nice on paper, but when you you're thinking about benchmarking, it's a nightmare because uh it means that uh the non-deterministic uh execution path I was talking about before, like uh your code can take uh multiple amounts of time to execute uh in perfect conditions. Uh, this creates a lot more possibilities for this because if your code is like uh completely optimized or partially optimized or like not optimized at all, it creates a lot of different execution paths. And so, for example, in um in Node.js it uses V8, which is a JavaScript engine built by Google, uh used in Google Chrome and as well in Node.js and other other tools. And they have this uh at the like really, really uh deep uh level of optimization, which means for the same piece of code you can have uh like uh probably uh tens of uh like tens or maybe hundreds of execution uh forms, uh which is really a problem. Uh so actually going to Rust was really easier because uh in Rust or in C or any compiled language, like you build your code, uh it's optimized and it doesn't move. And uh so this was easier, but the uh the counterpart was that creating the appropriate tooling around Rust was harder because in Python it's pretty easy, like you want to run your test, you do like uh UV run UV run py test or just py test, it's it's pretty simple. But in Rust, uh it's not that simple because you need to compile the code, then you need to execute it, and the this compilation step in in Rust or in C's pretty difficult, especially when you want to hook uh in uh like uh instrumentation as we do. So this was uh harder on the part of like integrating with the cargo and all the Rust tool chain, but otherwise, like for the measurement, profiling, getting like all the function data, it was much simpler as well because uh in Python um if you track the function and what part of the what exactly like the instructions uh in assemblies that are executed, um it will be the instructions from the CPython interpreter written in C, not actually your code. And so in Python we have, and as well in Node and in any like interpreted languages, we have to make a mapping between like those like interpreter uh assembly lines or assembly instructions to your actual code. So this was a bit more tricure in Python. So going to compile language languages was uh easier on that end.

Bob:

Yeah, that's interesting. Yeah. Um yeah, a bit more on the on the other features then. Um there are differential flame graphs. Um so how do these help uh pinpoint um exactly what got slower?

Arthur:

Yes. Uh so the idea is that uh you we start from like flame graphs, uh, and so a flame graph is um essentially it's a kind of tree representation of the execution of your code, and so each bar on the flame graph uh is uh representing a function, and then the child are like the function code. And we use them in a differential way, so we have the flame graph before and after your code changes, we combine them, and then we can highlight exactly which functions got slower or faster. So you we use like color coding, so when a function is red, it's got it's gotten slower, and when it's uh green, it's it's actually faster. And so this really helps to understand what actually changed in your code because in the beginning we did not have this in CodSpeed. So the first the first version is, for example, used by by PyDynatic uh quite a long time ago. Uh but then it was okay, my code is slower. Great. I need to figure it out and try to vary it. And so here it gets a bit more actionable because you're like, okay, exactly. This function was added because, as well, for example, sometime function functions like were there, but they are not here anymore because you remove them, or some functions are added, and so as well, it helps you understand exactly what created those issues and uh how to actually uh try to figure it out. Uh so yeah, it's mostly a visualization tool, and uh and and we are working on some others.

Bob:

Yeah, interesting. And um, there's also merge protection, right? So, how would you recommend teams um set thresholds without uh creating noise?

Arthur:

Yeah, so it's exactly like uh Cut Cove or like uh other coverage tools, uh, because uh as well as I mentioned it in the beginning, uh a strong inspiration for CodSpeed was uh tools that allow you, for example, to track coverage to make sure that uh your tests cover like 100% of your code base. And so, yeah, it's you have a check in GitHub or in GitLab or whatever that will tell you that actually performance got degraded uh by X percent. And you can set a threshold that will make this check fail and so essentially prevent you from merging any performance regressions. And for the setting, what we recommend right now is to start with 10%, it's the default one, and then to re to lower it uh over time, mostly because you can start with some benchmarks uh with some really non-deterministic uh execution pattern patterns, as I mentioned it in the beginning. And the goal is for the tool to stay as actionable as possible in the beginning. And if we start giving a lot of false positives just because of the setup in the beginning, it will become really like annoying and you will stop caring about it. And that's not either our or either like uh the goal of the person doing the setup. But then you can really reduce uh this threshold. Actually, we're working on making this completely automated because we we measure the variance of each benchmark, and so we can lower it manually, uh, I mean automatically. And uh yeah, the the goal is really to reduce this so when you're confident about your performance testing setup, essentially you will catch really fine uh performance regressions. And as well, sometimes what we did not mention is that sometimes it's completely fine to introduce a regression. So, for example, uh, if you're introducing a brand new feature, maybe it will make uh and you're working, I don't know, on an API or something, maybe it will make all your endpoints like uh 10% slower or one endpoint uh 20% slower. But if it's for a new feature, it's completely fine. And when this happens, we have a like a performance owner that can go and acknowledge a bench a performance regression, uh, which is like uh saying, okay, we saw that this is a regression, but it's completely fine because it's like uh working as intended, and it's really to make sure you don't uh you don't encounter unexpected uh performance issues uh later on in production.

Bob:

Yeah, because it's contextual, right? Like yesterday I was working on some internal Django app and added some signals to update some data uh upon saving a cart and you know added some queries, and I'm like, yeah, extra queries, you know, and then not not your tool, but just in general analysis. It then uh the the consensus was like, well, let's let's keep it and and have it accept the extra queries. And yeah, in practice, I didn't notice anything, right? So it's all it's all relative, right, to to your app, to um the usage, the um the audience, um and and I guess also the trade-off of of the speed of shipping, right? So yeah, yeah, I think that sounds uh uh reasonable. Yeah. Um now my last question is uh you're gonna you're gonna like this. Uh AI coding agents, uh, you know, they're shipping or you know, together with us, shipping more code and faster. And um, yeah, have you seen an increase in regressions? And uh yeah, how do you uh think code CodSpeed CodSpeed uh fit into an agent-driven workflow, which is more common these days?

Arthur:

Yeah, definitely. Yeah, we we saw this quite a lot. We adopted uh adopted it ourselves. We we talked briefly about it before, but uh so you so you know that uh we were kind of big fans of uh of those agents. Um and so yeah, what what we saw is that and actually it's also why um a lot of uh PR review tools uh appeared, uh like Copilot and uh many others uh that can do code review. It's like no, you it's it's really simple to generate uh tremendous amount of code really fast without really caring about if it works well or if it's fast enough. And uh so yeah, essentially a lot of uh potential issues. And uh yeah, we're really excited because essentially we when we were building CodSpeed before this was a it didn't exist. Uh the problem wasn't as big as it is right now because now essentially um your agent by default it doesn't have uh some kind of uh performance uh guardrails. And uh the goal for CodSpeed is first to be something that can define okay, this function uh is really important to our code base, it's called millions time per like hour days or whatever, and it's costing either a lot of money or like its performance really has a big impact on our users because uh it's used uh quite a lot, it's already a core component or something. And so being able to add some safeguards uh on those functions and as well as allow agents to measure performance using CodSpeed with it is really the goal so that your agent, uh whatever uh tool you're using, uh cursor, cloud code, or whatever, can get this performance feedback uh from us and iterate on performance. Because um also some something we saw was that we can give context on performance and why it's uh it should be improved, etc. But often uh there are some really deep um technical concepts. So, for example, if you're working in Python, you might have some performance issues that are not really related to the way you wrote your Python code, but mostly to actually what's happening under the wood in CPython. So we mentioned before the JIT and everything, and this as the language evolves and becomes becomes faster and more powerful, we'll only see more of those because, for example, you might have some pattern that will prevent the JIT from optimizing your code. This already exists in JavaScript and it will exist for sure in Python. Uh there are a lot of different things like this. Where it's like, okay, uh, my goal is to write code, not to really think about internals of the Python for now. And uh, so the the goal is that uh your agent can handle this uh with CodSpeed, uh, communicating and uh using the using this uh those performance feedbacks. And so, yeah, well, it's really exciting because uh uh working on performance uh was kind of reserved to some people really like with uh deep technical knowledge uh in low-level languages and everything, and now it can become really uh more accessible and uh yeah will uh be way easier to optimize.

Bob:

Yeah, that's a general trend, right? With agents, because we can work faster, we can address things we normally wouldn't do, right? So we were spending becoming more horizontal or more, you know, you have a more breadth of skills. Um sorry if I missed it, but uh, what is then your recommendation? For example, you work with cloud code. Do you put something in your uh cloud.md file, or how do you um um have it communicate with uh CodSpeed?

Arthur:

Uh right now it's not like the integration is not uh completely done, uh, but you can make it use the CodSpeed CLI. So essentially uh you can say okay, run uh CodSpeed, run pie test uh and whatever test suit you want to measure, and it will get the performance feedback. Uh so this is the first step. Uh, but as I mentioned before, it just as for a human, it's interesting, but then it's like okay, so what like uh the the just as a human, the the agent will be like, okay, what do what do I do next? And so uh what should uh what we will ship uh in normally a week or a bit more is uh essentially the ability to give the flame graphs and the all the profiling data to your uh agent, so essentially it can also optimize based on this data.

Bob:

And that's what they're really good at, right? If you give that granular and detail, then it's it's really good at that. Yeah, yeah.

Arthur:

In the in the experiment uh we we have running uh currently on uh in our own code base. Uh it's really like okay, I should uh optimize this and doing like for things you wouldn't really think of doing, like uh okay, yes, there is this uh regular expression, uh yeah, I'll just write it by hand in uh in C. Like okay, I wouldn't do this ever. And it's also kind of the limit of the tool, is like it can also create some some patterns that uh will create some issues on the leasibility and the like code code readability, which is uh might be uh might be a problem, but at least if you want to make something really, really fast, you can do it, and uh even without having a lot of uh of knowledge around it.

Bob:

I'm definitely going to try this and just um put it in my dev dependencies, add it to my make file, and then uh start looking at flame grass and feeding them to uh the agent. Sounds really fun to me. Um, yeah, maybe uh three steps then for people where they can start. They can go to CodSpeed.io, right, make an account, and then uh just uh UVX or UV uh add the tool, uh I guess to your developer dependencies, and then just run experiment with it uh from the command line. Again, I really like to just have the YAML workflow uh which you offered and then have it just run automatically. Um yeah, what what else? I think that's a good start for people, right?

Arthur:

Yeah, definitely. And you can also start using it with just like uh PyTest CodSpeed, uh, which is uh essentially it's a benchmarking framework, so you can use it even without connecting to CodSpeed. Uh the idea being uh you can use it in a kind of uh headless way without the profiling and everything to iterate uh really quickly, and then when you you know actually you want to hook this up to your CI, you can just add a simple uh GitHub workflow and uh you create your account and integrate with CodSpeed. Uh but yeah, that's a pretty good start.

Bob:

Uh one thing I missed with uh with a plugin. So you would add the PyTest plugin. Is there a plugin for this?

Arthur:

Or uh yes, it's called uh PyTest-CodSpeed. Okay, and essentially it's a benchmarking plugin, so with a marker, uh really close to PyTest uh benchmark. So essentially with a marker, you can mark some files, some entire files, for example, with the integration test, and you mark them as benchmarks. And then if you run py test dash dash CodSpeed, it will give you like the usual benchmarking data, so without simulation and anything, but it will get you started to performance measurement, and then it's really easy to integrate with uh with CodSpeed uh later down the line because uh essentially it's just uh running this into the uh simulated environment or using any tool we provide, and uh yeah, it's uh pretty straightforward from them.

Bob:

Uh it's even easier, yeah. It's similar to coverage, right? Like I never install coverage, I just do py test-coff and then you have just py test telemetrize telemetrized basically. Yeah, yeah, definitely. Yeah, nice. Well, I hope uh our audience is jumping on this because I think it's very useful. And uh yeah, thanks for sharing today. Oh, we always end up with a reading tip if you are reading anything, but otherwise uh um not right now.

Arthur:

I mean, no nothing source code.

Bob:

I mean, you are too busy to read, but reviewing code is what yeah, yeah, more code reading. I can I can attest to that.

Arthur:

Yeah, instead of writing it. But yeah, thanks for having me. That was uh really, really enjoying enjoyable.

Bob:

Yeah, thanks for sharing. Thanks for the work that you do, and uh join join our community. Um, I will share the episode there and then uh people can reach out to you there. But of course, I will link your socials as well if people want to reach out. And yeah, thanks for uh the great work and uh yeah for sharing today. Yes, thanks. Bye.

Julian:

Hey everyone, thanks for tuning into the Pivolites Podcast. I really hope you enjoyed it. A quick message from me and people before you go to get the most out of your experience with Pyberlitz, including learning more Python, engaging with other developers, learning a better guest, discussing these podcast episodes, and much, much more. Please join our community at PyBolites.circle.so. The link is on the screen if you're watching this on YouTube, and it's in the show notes for everyone else. When you join, make sure you introduce yourself, engage with myself and Bob, and the many other developers in the community. It's one of the greatest things you can do to expand your knowledge and reach and network as a Python developer. We'll see you in the next episode, and we will see you in the community.