
Pybites Podcast
The Pybites Podcast is a podcast about Python Development, Career and Mindset skills.
Hosted by the Co-Founders, Bob Belderbos and Julian Sequeira, this podcast is for anyone interested in Python and looking for tips, tricks and concepts related to Career + Mindset.
For more information on Pybites, visit us at https://pybit.es and connect with us on LinkedIn:
Julian: https://www.linkedin.com/in/juliansequeira/
Bob: https://www.linkedin.com/in/bbelderbos/
Pybites Podcast
#076 - Data engineering involves more Python than you might think!
This week we have Christo back on the show to talk about his experience in the data engineering field.
He shares some valuable tips how to become a more effective data engineer which, surprisingly or not, increasingly requires a well-rounded Python developer skill set.
Enjoy and feel free to reach out to Christo below ...
Christo's website:
https://www.christoolivier.com
Christo is a PDM coach now, check it out:
https://pybit.es/catalogue/the-pdm-program/
Previous episode Christo was on:
https://www.pybitespodcast.com/1501156/8005574-013-the-mindset-of-a-developer
He is also in our Slack community:
http://pybit.es/community/
I think the moment that you start experiencing friction, and what I mean by friction is you start experiencing that you are hitting the limit of your understanding of something or hitting a point where you feel like you are not entirely sure about how this works. Don't shy away from that. That's a very good indicator that there's something you're missing. Hello, and welcome to the Py Bytes podcast, where we talk about Python career and mindset. We're your hosts. I'm Julian Sequeira. And I am Bob Baldeboz. If you're looking to improve your python, your career, and learn the mindset for success, this is the podcast for you. Let's get started. Welcome back to another Pie Bytes podcast episode. This is Julian. I'm here with Bob. How's it going, man? Hey, man. Glad to be back. How are you doing today? Yeah, good. I'm going to largely spend this episode on mute. I think I'm just getting over. It's funny, I just, a couple of weeks ago said getting over COVID, and now I'm getting over the flu, so I'm having a really great time at home, which is. Which is awesome. Still getting over this cough, so I'll be muting myself to make sure I don't blast anyone's eardrums. What about you? You good? Yeah. Good. Man, that takes mindset. You have had, like, all the COVID influenza, a sick baby, all that stuff. It's odd ones. Definitely been rough, but it feels good. Mister mindset, you know how to handle that. It's nice to have some. Some normalcy and be recording the podcast again. So that feels good. But on that note, it shouldn't be so rude. We have a very special talking, like. Like it's two of us, but it's not just. We have a special guest with us today. Um, Christo, who has been part of the pie Bytes family for yonks for forever, um, is back. What episode? Do you know what episode Christo was on, Bob? 17. Oh, geez. Did research. Nice. I have no idea. No. So, Christ, 1313, mindset of a developer. I see. Perfect. Well, Christo, welcome back to the podcast. Hey, guys. It's nice to be back and it's good to, uh, good to catch up with you again. Yeah, that's been a while to have you. It has been a while. That was so. That was episode 13. What are we at now? 70. 76 or so? Yeah, 76. It's been a while. It's been a while, but more than a year. We're grateful to have you back on because so much has changed. But yeah, just, if you want, just to kick it off, why don't you give everyone a quick rehash into what you're up to, what you're. Just introduce yourself. Really? Yeah, sure. I'm happy to do that. I. Well, as the people that have caught me on the first round know, I work as a freelancer, predominantly doing data architecture and cloud data platform design and development. So since the last podcast episode, I have been working for a us cybersecurity firm doing a pretty interesting solution for the marketing and revenue operations side of the business. So it's been a lot of python, as I reckon most guests on the. On the podcast typically, typically do. But it's been. Yeah, it's been fun. It's been keeping me very, very busy and a lot of learning out of it. So good, good times all around. Awesome, awesome. And I think there's one more thing that changed since. Right, which. Which thing would that be? Would that be the thing that's causing me a lot a loss of sleep? And we're not talking about financial problems. I was not going there, but. No, no, no, the coaching. So we're actually really, really excited to announce. Well, not announced, but announce that Christo Olivier is currently a coach for the PDM program. And so, Christo, we're grateful for all of the effort and dedication you've had to PDM and the people that you've coached. And we're excited to share that. You're on the PDM page. Yes. Your picture's up there. Yeah, man, that's awesome. I completely forgot about that because that's not causing any sleepless nights. Always, always a good experience, being able to be part of PDM and. Yeah, take other people through the journey that I've been on and help them, specifically with regards to python and data. So it's been awesome to be part of that and also to meet the other coaches. So, yeah, don't know what else to say then. It's been a joy so far and hope it continues. Awesome. Yeah, we're grateful for that. And talking about data, maybe we can move in a bit into the meat of the episode. And we wanted to talk about data and Python and how overused, underused Python is in the data world. Or maybe there are some false expectations. I don't know. What do you see happening? Yeah, it's a very interesting one, Bob. I've had this experience repeatedly over the years. Right. And it's been the case that whenever we talk about Python and data, a lot of the people that embark on that particular journey don't realize just how much of a subset of Python they are actually using. So we end up with really just a massive focus on things like the analytics packages, the data science packages, the machine learning packages in python. Actually, what you don't realize when you're using just numpy pandas, Pytorch, the tensorflow libraries, all of the data science and mister related packages, it's just the vast majority of Python you're not actually touching. It ends up being a case that you don't learn about the intricacies of packaging your application, you don't learn about building command line apps or how to turn some of the things you've built into usable pieces of software. And there's a, and this is my opinion, and I'm sure there'll be people that differ, but I find especially in my work on the data engineering side and the platform engineering side, you'll do a lot of, a lot of people use Python and they'll glue a lot of things together, or they'll use Python to process data, say, with something like sparkling. But actually what you're doing is you are directing another tool or another system to do the processing. You're using Python as a domain, a DSL, to do the things you want to do, but you'll run into problems where you want to use a new package to do something specific, and you've got some import problems, and all of a sudden you start dropping into that gray area that you've, you've seen in tutorials. You might have touched on it, you had this issue once before, but you're not very comfortable with it and you run into challenges, or you realize that turning something or creating a particular command line application would be a really big win for you and for the team that you're working with, but you lack the skills to actually do that confidently. And this is the bit about Python and data that sort of, I wouldn't say annoys me. I think I'm just acutely aware of the fact that people don't typically grasp how large the Python ecosystem is, and probably that would go for a lot of other programming languages as well. But Python has specifically suffered from this because there are so many data specific packages where you end up just working within them, where for other programming languages, I highly doubt that a C developer would sit with this problem. They'll be creating systems and components, and Golang is another example where it's a lot of back end systems development. There's a lot more application development, a lot more software engineering going on with it, where Python has carved out a really nice niche but also really siloed these things within the language ecosystem, it's easy to not realize how much more benefit you can derive from learning some of the other disciplines. No, that's really great insight. And when so because you're well versed in this space, right, and you've been working in it for years and everything like that, do you see a lot of people, like, coming in thinking, okay, I want to be a data engineer. I just need to know these specific libraries, like the ones you talked about, like Pyspark and so on. So they have this sort of misconception, this is all I need to know. And then what happens? Do they hit a wall? At what point do you think people suddenly start to realize, I should have more of a holistic understanding of Python and how more of these moving pieces work, and maybe at some point they wake up and go, oh, crap, this isn't exactly what I thought it was going to be. Have you seen that sort of thing? I've seen that actually happen a lot, Julian. It's a bit of an insidious thing, because when someone starts out as a junior that comes into the projects and into the field, you are going to be specializing and focusing on being able to work with something like Pyspark, as you've mentioned, that's a really good example of it. But as you start moving up seniority, you start tackling bigger and bigger problems, and that's where you then start running into these issues where you realize, you know what? I actually don't know enough to tackle this particular problem that's come along. But it comes at a really weird point when you realize that because you've probably gained some seniority, you're moving up the ranks in the company that you're working at. But now you've got to drop down and be a beginner again in these other parts that you've got no experience in. And for someone that thinks they've now progressed and they're now more senior, it can sometimes be a really hard knock to realize that I've got to be a junior again in this other stuff in order to really get me two steps up the ladder and increase my value and my skills. And for some people, thats actually really just too much to swallow, right. They dont want to swallow their pride and realize that they have to go back to being a junior. Ive been fortunate that ive learned those lessons, and this is why im talking about it. Ive learned those lessons myself. But it would have been great if someone else told me this back in the day when I started out to have your specialization, but never take your eye off the ball and see the entire landscape and know where you need to dip back in and be humble enough to be a junior in something in order to get yourself much further than you would have if you just sort of clung to the thing that you've learned and you've now invested all of this time and effort in it. And it does happen a lot in the data world because we work with such specialized python packages. I've had it a few times where people have joined teams that I was working on. They would say, oh yeah, no, I know Python. And then when I started having a conversation I realized, well, they know how to use pandas and theyre really good at it, right? Theyre extremely good at it. But the problem we were solving was building custom connectors for a new integration tool that came on the market and they were just completely out of their depth and its no fault of their own. It's just that they didn't learn about software engineering, they didn't learn about software architecture. None of those things matter when you're writing pandas or when you're using Numpy to solve data and computation problems, sort of scientific computing and all of the different specializations that people have. But it does matter if you going to start tackling larger, more complex data engineering and data integration problems because you're going to encounter new software, you're going to have to learn how those things work. You're going to have to understand the things that other people whose software you're using already understand really well so that you can get up to speed fast and start delivering value. Testing I use Jupyter notebooks. Testing I use Jupyter notebooks. But notebooks is a good example. Bob. There are a lot of people where they'll be able to build a Jupyter notebook that actually extracts data out of an API, transform the data, land it in something. And they do that because they're comfortable. They've learned sort of using Python and Jupyter notebooks. But now what happens if I need to schedule that thing to run regularly and the platform that's available to me is not Jupyter notebooks in my company, right, they don't have, because you can schedule notebooks with something like Databrick Spark. That's sort of their environment that you work in. But if you are running a completely different setup. Now I need to kick something off with Cron, or I need to use airflow or dagster or something else. How do you then go from Jupyter notebook to something that you can actually make production ready on your environment? And those are exactly the things I'm talking about. Right. If the only thing you've ever done is work in Jupyter notebooks, then there's actually a lot of people that think that that's how, especially in the data and data science part of the world, that's where Python's at. Right? I'm doing Python and you are, but you're using it in a very specific and a very niche way. Yeah, I can kind of relate as well. Back when I was a developer, the amount of DevOps stuff we actually had to know to just get the system running locally, Linux Docker, setting up different microservices, like nothing related to writing code, but we're supposed to know it as adjacent skills, I would like to call them. Right, yeah, exactly. And those adjacent skills, I find that what those things teach you is that it gives you a really good mental model for how things work. So we sort of now moving away from the idea of just focusing on Python here and all of the different things within Python. But if you, if you had a really good grounding in all of those things, then making the leap to something like Docker or Kubernetes becomes a lot easier. So you've got all of this muscle memory, you've got all of these mental models for how things work. And you are then able to just roll and adapt with new technology because it's improving all the time. Or it's, let's not say improving, that might be the wrong word. It's changing all the time. And your experience helps you to change with it a lot faster than someone that comes in cold. Yeah, totally get that. And you know, this is really amazing insight because I feel like a lot of people just have this stereotype of, you know, this is what I need to be a data engineer. This is what all the courses are teaching me. This is the content on the Internet, on YouTube, all that stuff. So I really like going back to one of the things you said, you know, just before, was that you wish someone had told you this stuff when you were starting out. And so this is you paying it forward. So there you go. That, that should help you sleep better tonight. Well, thanks, Christo. I appreciate that. Julian. If you can tell that to my, my nine month old son to make him give me, because that's what I was alluding to before we talked about the. Being a PDM coach, that's the thing that's been severely interrupting sleep. He's been, he's been an absolute champ at that. Like if there was an Olympics for, for interrupting his parents sleep. Definitely gold medal candidate. I think my kids will give yours a run for the money, but I think they're all at this point that pie bites is the remedy for a lack of sleep or a compensation. Says you to get a eight hour sleep. Not always. Not always. All right, seven and a half bob. Oh yeah, only seven and a half. Poor Bob. But Chris, to wrap this up, you know, I want to end on a practical note here. So the recommendation pretty much is that, you know, and it's not just data engineering. I feel like a lot of this is relevant to no matter what field you're in, even if, for me, even if you're not in a programming field, it is useful to have these holistic skills, because then if you're a builder, if you're the kind of person who likes to solve problems, you can solve whatever problems you want for your team, even if you are in, say, finance or communications or whatever the hell it is. But in this instance, with data engineering, having these DevOps skills, but also having these holistic python skills and knowing things like working with APIs, potentially knowing the web frameworks, you can say, present data in a different way. That's actually useful for the majority of people, things like that. How would you suggest people go about, I guess, branching into that stuff? You've probably left the hardest question for last year, Julian. I think the moment that you start experiencing friction, and what I mean by friction is you start experiencing that you are hitting the limit of your understanding of something, or hitting a point where you feel like you are not entirely sure about how this works, don't shy away from that. That's a very good indicator that there's something you're missing. Dig into it, lean into it and go learn about it. You don't time box it, right? If it's not the primary thing that you're working on right now, don't go and spend all of your time on that. But put some time aside, a good couple of hours to start digging into it and understand it better, to the point where you can explain that to someone else or where you can actually build something python related. Right. So for me that would be. I'm not massive in web development. It's not something that I do as part of my day to day work. But the moment I rubbed up against sort of my limitations or hit the wall in my understanding of how web frameworks work and how web development works, it was really good to sit down, dedicate some time to building something in Django. And even if it's just go through the Django tutorial, build the Django, the Django app, or flask or whichever one of the frameworks you want to use, go build something and experience what it's like, because you'll realize that you already possess enough knowledge to get started on that path. And anything that you encounter that's completely foreign becomes you become comfortable with it and you've now broadened your, your understanding. And that just puts you in a, that, that one step above the rest of the, the community or the rest of the people that you'll be working with that aren't willing to do that, and you might never do web development. In my case, I very rarely do web development, but I'm now not in a position anymore where someone says we really need to get a quick website up to display this thing, something on the systems integration side or the platform engineering side. I can do that now because I've invested that time and I'm not scared to try it out, I'm not scared to work on it. So that's how I've gone about doing it. There are loads of other examples of, of what people would have worked for other people, but to me that's been the one that's helped me because it's kept me honest with the work that I do on a daily basis, but then still branching out into these other areas that I don't get a chance to work with on a day to day basis. That's awesome advice. I think it's similar to what we always say. If you're comfortable for too long, you're not growing enough. By being deliberate, picking up, deliberately picking up new skills, you force yourself to become uncomfortable for a couple of hours, but then when you're asked, you can actually pick up a task and almost be comfortable with it. Yeah, exactly. The basics now. Exactly. And it might. Maybe web development is something that you don't push up against. Right. Maybe it's actually going from writing your Jupyter notebook to actually turn that into a command line apply. Have you actually thought about building a command line app that takes the API that you're connecting to, takes the data, does something with it and pushes it into a destination and just that, just that exercise. You can probably keep yourself busy for countless hours improving that adding configuration to it, being able to push it to different destinations, you can learn so much from that exercise and youre going to have to learn about software architecture, how to make this thing more maintainable. And its going to be a concrete example. Its something that youre actually working with and you might realize that you end up putting that thing in production. The team that youre working with might find it extremely valuable and youve done something that not just benefits your own understanding and knowledge, but the team that youre working with, the company that youre working for. These are the sorts of things that I always keep an eye out for personally, because this is how you learn and you learn best when you're working on something that you are directly involved in. So it doesn't benefit you as much if you're just taking some tutorial somewhere or some. And I know we talk about this PDM style stuff a lot, but it's difficult. It's difficult to not bring it back to PDM, right. And people are going to start saying every time I get on the show I'm just basically paid advertising for PDM. But that's why PDM doesn't give you a bunch of example apps that everyone builds because you're not going to be as engaged, you're not going to be as bought into the thing you're trying to build and you're not going to learn as much. Well, funny story, when we worked on Django together in PDM, we actually ended up with the repository pattern which you then later presented on the code clinic. So that was never, that was not planned. Like toy app in Django ended up with some heavy design patterns stuff. So that's a great, I'm really happy that you added that point back. Even a small command line app can really go advanced and into all kinds of directions. And that's where the real learning happened. Exactly. Thanks. Thanks for bringing that up as well, Christo, because I was, as you were both talking, I was like, geez, pdm sounds like it'd be a great solution for people. And then you brought it up. So I appreciate that. You're most welcome. And again, not because it's not because I'm helping on the mentoring side or anything, it really is just, it's just the way it is. Right. It follows this principle and this really works awesome. Well, thanks. Thanks Krista, for that and for sharing all that, that valuable insight into the data engineering space. I think a lot of people are going to find this useful and I'm hoping there's just a handful of people that you're going to inspire to branch out and be that sort of advice you wish you had years ago. So thanks for sharing the insight. I really do appreciate it. You're most welcome. Thanks for having me on the show and for us being able to talk about it. Well, we don't, we're not done yet. Right? You don't get to hang up yet. Jeez. Any, uh, any books? What? What are you reading? What are you reading, son? What are you reading? What am I reading? I am not reading anything right now. I'm trying to sleep. I feel so much better because I haven't read a book three weeks. We always have a backup plan. Uh, what's a win? You want to share? Uh, well, man, youre putting me on the spot big time. I reckon a big win for me is im actually wrapping up a very large project. So ive been on this since November of 2020 and weve had a massive success with this marketing insights and automation platform that ive built. So thats wrapping up at the end of July. And, yeah, could not have been happier with the way its gone. Client is ecstatic. So that's a big win for me, being able to draw a successful conclusion after such a long project. And a lot of the challenges we faced, all of them resolved, built out a really straightforward architecture and platform, something that's really made a difference to the bottom line of the organization. So to me, that's a massive win and that's why I do the freelance stuff that I do. Awesome, man. Well, I'm going to throw a quick win in there on your behalf as well because you're too modest. I don't think anyone, anyone's ever called you modest. Are you too modest? You shared your new website with us and it looks spectacular. Thank you. So we've got this really cool picture of you in a suit, or, well, you know, jeans and suit. You're looking way too flash for, for my liking, but I really liked it and I thought it was really professional. So we're going to throw that in the show notes as well so people can get to know you a little bit more. Fantastic. Thank you. That'll be super cool. But I really enjoyed it. That's why I thought it was worth calling out, but I think that's pretty much it. Bob, did you have anything else you want to add? No, I think that's it. Thanks so much, Christopher, sharing those insights, I'm sure it will inspire our audience. Any final shout out or words before we wrap it up? No, just thanks again for having me on the show. And if anyone listening to this is stuck and they feel that they could benefit from building some of the things we talked about or doing PDM, if they want to work with me on PDM, then I'm there to help on the data side. So reach out and yeah, make that change in your career and in your python journey. And we're all here to assist you on that. Beautiful. I love it, man. Well, thanks so much again, Christo. We really appreciate you being here. We promised to have you back sooner than 60 episodes, so whatever it's at now. So we will have you back sooner than later. But everyone listening, thank you so much. That was Christo. Olivia, we really appreciate you being here and everything that you do, and we will be back next week, so thank you for listening. We hope you enjoyed this episode. To hear more from us, go to Pibyte friends, that is Pibit es friends and receive a free gift just for being a friend of the show. And to join our thriving slack community of Python programmers, go to pibytes.com slash community. That's Pibit es community. We hope to see you there and catch you in the next episode.