Data Science Hangout with Wes McKinney
This transcript and summary were AI-generated and may contain errors.
Summary
In this Data Science Hangout episode with Rachael Dempsey, I discuss my recent transition from Voltron Data to Posit as Principal Architect, and my long history of collaboration with the company going back over a decade. I explain how RStudio (now Posit) helped incubate what became Ursa Labs and eventually Voltron Data through their support of Apache Arrow development.
The conversation covers several topics: the challenges of cross-language package management between R and Python (including tools like Conda, Mamba, and Pixi), how Posit is learning to support Python users through projects like Quarto and Shiny for Python, and the importance of building cross-language bridges rather than engaging in “language wars.” I discuss open source versus proprietary software as the real competition, not Python versus R.
I also address questions about LLMs and AI—how Copilot integration is table stakes in 2024, my concerns about ethical use and workforce impact, and the potential for LLM-assisted data exploration beyond just code generation. On career advice, I emphasize focusing on people and relationships with a long-term mindset, noting that working relationships from a decade ago remain valuable today. I also share how I developed a thick skin for negative feedback by trying to understand that criticism often reflects what others are dealing with in their own lives.
Key Quotes
“We’re not working R versus Python, we’re really open source versus closed source proprietary software. And so, kind of expanding the tent of open source, and making open source the preferred and attractive way for businesses to go forward, that is our main challenge.” — Wes McKinney
“Getting past the red team, blue team bickering about whose programming language is better has enabled us to have just a lot more productive conversations about how we can build real tools for humans that are usable and accessible.” — Wes McKinney
“When somebody’s annoyed with your software not doing what they wish it did or being perfect in the ways that they wish for it to be perfect, that comment is as much about them as it is about you. And so, you have to try not to take it too personally.” — Wes McKinney
“The feedback that you get in an open-source project tends to only be negative feedback. Sometimes you get people telling you thank you, but mostly you get the negative feedback—people engage when there’s something that they don’t like, or something that doesn’t work, or something that’s missing.” — Wes McKinney
“The downside of making developers more productive is in the future we need fewer developers, most likely. You would hope that at some point as we all become more productive that we can work less. But often the pointy-haired bosses would prefer that we do twice as much work in the same amount of time with better tools.” — Wes McKinney
“I’ve made a personal promise to myself to never work on a Python packaging tool because we’ve all thought about it because you get frustrated with the packaging… I might build a text editor before I build a Python packaging tool.” — Wes McKinney
“Focus on people and relationships, investing in your relationships with people that you want to work with for a long time and also viewing your working relationships with a long-term mindset, not just what value do we have to each other right now, but what value might these working relationships have in the future.” — Wes McKinney
“I have people that I’ve worked with actively off and on for over a decade, and I treasure those relationships. Taking a more people and human-centric mindset towards that has been a lot more rewarding and more valuable long-term.” — Wes McKinney
Transcript
Rachael Dempsey: Hi, everybody. Welcome back to the Data Science Hangout. I’m Rachel. I lead Customer Marketing at Posit, and I’m so excited to have you joining us today. The Hangout is our open space to hear what’s going on in the world of data across different industries, chat about data science leadership, and connect with others who are facing similar things as you. We get together here every Thursday at the same time, same place. So if this is your first time joining us, it is so nice to meet you. If it is anybody’s first time, let us know in the chat. We’d love to say hi and welcome anybody in who’s joining for the first time today. We’re all dedicated to keeping this a friendly and welcoming space for everybody, so we love hearing from you no matter your years of experience, titles, industry, or the languages that you work in. It’s totally okay to also just listen in here if you want, and awesome to be a part of the party that happens in Zoom chat. You’ll notice that people share a lot of helpful resources and thoughts in the chat as well. There’s also three ways that you can jump in and ask questions or provide your own perspective on certain topics too. First, you could raise your hand on Zoom and I’ll keep an eye out and I can call on you. Two, you can put questions in the Zoom chat, and if you’re in your coffee shop or maybe out walking your dog or something and you want to put a star next to it, I’ll know to read it. Otherwise, I’ll call on you to introduce yourself and add some context. And then third, we also have a Slido link where you can ask questions anonymously too.
I did just want to mention because we have quite a few people maybe joining for the first time that we do also have a LinkedIn group for the Data Science Hangout. Not always the easiest for ongoing conversations, but it’s helpful to help connect with people that you meet here, maybe have met through the chat. And I also did want to say real quickly before we get started here, and I know Curtis will share this in the chat, but the Call for Talks just recently opened for the Posit conference this year. And so I just wanted to make that known and Curtis can share the link there in the chat. I guess one other quick note is if you are watching this recording sometime in the future and want to join us live, the link to add it to your calendar will be in the details below. And there’s no rules and anybody has to stay on the whole time or talk, come and go as if it’s your schedule.
But with all that, thank you again for joining us. And I am so excited to be joined by my co-host today, Wes McKinney, Principal Architect at Posit. And Wes, I want to kick things off here by introducing yourself and your role, but also something you like to do outside of work too.
Wes McKinney: Yeah, I’m a software developer and an entrepreneur. So I’ve worked on a bunch of open source projects in and around data science and Python in particular. So started what became the pandas Project about 16 years ago. And I’ve started a couple of companies and created some other companies oriented at providing financial funding for open source development, partly with Posit’s help. So worked on the Apache Arrow data infrastructure and computing layer for data science tools and database systems. And that I worked closely with Posit when it was still our studio throughout that project. And most recently spent a little over three years starting Voltron Data and getting it off the ground to a 130-person company or so. And just last fall, made the transition out of my full-time role at Voltron Data. So I’m still an advisor there on the advisory board. And I decided to rejoin up with more full-time with Posit to kind of work to make awesome polyglot data science tools and open source software.
And what about something you like to do outside of work? Oh yeah, outside of work. Outside of work, I do a lot of yoga. So it turns out like both Hadley and I are super into yoga, which is kind of, you know, just random and unrelated. So I do that. I like to cook and run and read books. And I still find a little bit of time for video games now and then, console video games. I have a soft spot for retro games. So every now and then we’ll, you know, indulge in a playthrough of an old Super Nintendo game or something like that.
Rachael Dempsey: Love it. Oh, I know you shared a little bit about this in your intro, but to kick us off here, could you share a bit about your path and kind of decision to join Posit?
Wes McKinney: Yeah. I mean, a lot of, I mean, people who know me well, you know, that I’ve been involved in kind of collaborating with RStudio, Posit for a pretty long time. So I got in touch with, I met JJ well over, JJ Allaire, you know, the CEO of Posit well over a decade ago when RStudio was just getting started. And I started talking with Hadley about things that we could work on together that would help, like, end the data science wars and looking out for opportunities to work together on things that would help create more, like, reusable software and things that would help data science teams working with open source technology be more productive, enable Python and R teams to work together more, you know, more effectively.
And so there were a couple of things that happened. So firstly, I think, you know, Jupyter, you know, IPython started the IPython notebook that became Jupyter. And so there started to be work on notebooks and development environments that could be shared across Python and R. And then when we, when I was interested in starting what became Apache Arrow, one of the first things I do was, one of the first things I did was reach out to Hadley about building a data format that would enable data to be shared between, you know, dplyr and R data frames and pandas data frames to make using R and Python together a lot faster and a lot simpler. So we got together in February 2016 and built what became Feather. So that was kind of the start of the more hands-on collaboration with RStudio at that point.
And then a couple years later, I partnered, you know, partnered up more formally with Posit to create UrsaLabs. So providing significant funding for Apache Arrow development, helping with the administration of UrsaLabs so that I could focus on building software as opposed to running a company. And so in a way, like, you know, in a way Posit essentially helped me incubate what became Ursa Computing and then Voltron Data. And so Posit is, you know, remains like a kind of shareholder and kind of an active, like an active participant in things we’ve been doing in Arrow and Voltron Data. And so when I was considering kind of my next, you know, my next move after being, you know, full-time in a CTO role at Voltron Data, you know, the decision for me was given that Posit has expanded its umbrella of, you know, its products to support Python, you know, building for polyglot data science teams. I felt there was an opening for me to, you know, to use my experience and skills to really help bolster that effort, align the product offerings, and create really amazing experiences for data science teams.
So it ended up being, like, really great, you know, really great timing, you know, given that I was at a, you know, a crossroads in my career, like, deciding, you know, how to allocate my time between, you know, the various projects that I work on. And so, you know, and I’ve been, you know, just an enormous fan of the company, and so it’s great to kind of, you know, to be back and working more closely with, you know, with the team and to, you know, have the opportunity to work on software that has such a large impact on so many people.
Rachael Dempsey: Thank you. Well, we’re all so excited to have you here, too. I know I introduced you as principal architect, but maybe it might be helpful to explain a bit about what that means, what is a principal architect?
Wes McKinney: Yeah, so I’m a, so at present, I’m a little bit of a, like, a mixture between a, you know, very technical, technical product manager, and a, you know, senior software developer, software architect. So, on one hand, I’m helping provide, you know, feedback and, like, strategic, you know, guidance and alignment in Posit’s product, you know, product roadmap, you know, features and identifying, you know, blind spots and things that Posit needs to build in order to create, you know, in order to create, you know, really great tools and systems to support, in particular, Python-focused or Python-only data science teams.
I’m also, you know, as I’m finding my sea legs in the different projects going on in the company, I’m doing, you know, some strategic development, like, kind of targeted, have high leverage, you know, development work on different projects, that taking advantage of my background and experience with tools like pandas and kind of low-level, you know, low-level data work, like areas where stuff that I know really well, where I can jump into a project, provide some targeted, you know, targeted development assistance, and help accelerate projects, feature roadmap.
It’s only been, it’s only been about 10 weeks, so, you know, we’ll see where things are in another three months or another six months, you know, by the time of Posit Conf in August, but it’s been, yeah, it’s been really exciting, and, you know, there’s a lot of, there’s so much stuff going on, so it’s also an opportunity for me to really get up to speed on all the things that Posit’s been doing in the last, you know, five years while I’ve been busy, you know, working on, you know, working on Arrow, working on startups, but also to spend time listening to get, you know, look at what data science teams are struggling with right now, what things are working well, like what things aren’t working well, and where, like where are the opportunities to build things that can, yeah, make the, you know, make things better for data science teams.
Rachael Dempsey: Thank you. I’m starting to see a few questions coming in, and I know a few people joined right after the start, so I’ll just remind everybody, if you want to ask questions, you can post them into the Zoom chat, and I will call on you to jump in, or you can also ask questions anonymously, and our team will re-share the Slido link in the chat for you as well, but I see Sunday, you had a question about Voltron data. Do you want to ask that live?
Sunday: Sure. Can you hear me?
Rachael Dempsey: Yes, I can.
Sunday: Yeah, interesting background, and it’s interesting to hear how many things have been involved in, you know, Posit, Arrow, and Voltron, but Voltron is the one that’s completely strange to me. What’s that about? What’s that about? Sorry, my name is Sunday. I’m a senior research analyst with Premier Incorporated Health Consultants, and I work with all of these tools. I’m asking from outside of, I’m calling from Baltimore. Thank you.
Wes McKinney: Yeah, well, so learning more about Voltron data, we could spend the rest of the hour just talking about, you know, talking about the company and all the things that it’s doing. So, we produced this really great, like, knowledge base called the Composable Codex that goes through, basically, the history of how, like, the, like, trends in data processing systems and how we are currently in an era that we’re working towards, basically, modularizing the layers of the data stack to make things more interoperable and more, like, faster and more reusable.
Apache Arrow was one major piece of technology that was a, like, a missing, like, a missing key to being able to enable some of this modularization and composability, and last year, we worked with one of our development partners, Meta, and their data infrastructure team. They have a project called Velox that is helping kind of unify and modularize their query processing and query execution inside Meta’s internal data platform. And so, we created a paper that is the—I might fumble the title, but it’s the Composable Data Management System Manifesto.
And so, I would highly encourage you to check out the Composable Codex, which is on the VoltronData.com website, as well as the VLDB paper, the Composable Data Management System Manifesto, which goes into kind of the technical, you know, the technical, you know, kind of the technical, you know, foundations of, like, why—like, where we are in the development timeline of database systems, data science systems, data processing, and why we are working to create these modular and composable components for building next-generation data systems.
And so, VoltronData just recently launched its distributed GPU-accelerated, you know, modular data engine, which, you know, we are, you know, working to incorporate into, you know, data infrastructure providers like HP. And, you know, we believe that employing accelerators like GPUs, but also custom silicon FPGAs is going to be a big trend in terms of reducing the carbon footprint of large-scale data processing and machine learning.
And so, there’s a lot of—you know, there’s a big kind of a deep rabbit hole there, but it’s a very exciting time in the world of—in the world of computing. I think from the standpoint of data scientists, you know, most people on the call, you know, we just want to write data frame operations and SQL queries and have things go fast and not use a lot of—not use a lot of energy.
And so, another project that I’ve been very involved with over the last decade and that has a, you know, pretty good-sized team of VoltronData is the Python Ibis project, which provides a unified data frame API to tons of different SQL and non-SQL backends. And that’s one of the kind of the spokes of the strategy to facilitate the modularization of the data stack. So, you can think about Ibis as being like dplyr for—kind of like dplyr for Python. And so, we’ve, you know, we’ve been put—we’ve put a lot of development work to—into maturing that tool in the last few years.
Rachael Dempsey: Thank you. I see, James, you had a question in the chat about working on a multilingual team. Do you want to jump in here next?
James: Sure. So, I work on a multilingual Python and R team, but it’s not a software development team so much as just processes. We take processes that currently exist, ask, how can we automate these, make them run faster, and get data to end users that need it for a variety of purposes. And one of the main problems that I’ve encountered in working on that team is I’m an R user. I use projects in R—in R, E, and V. And all of the Python users, all six or seven of them, use a conda environments. And is there any progress on helping those two different workflows, like, speak better together? I know that it is—I know that it is—I don’t know much about—about RM, if that’s—if that’s how you pronounce it, but I know that it is possible to—it is—it is possible to manage an R dependency stack from conda. That’s maybe not the—maybe not the answer that you were that you were looking for.
Wes McKinney: But there—there is, like, a bunch of work in recent times on improving the package management tools in general, particularly around cross-language—cross-language packaging. So, like, one of the problems in the Python—in the Python world is that there is the Python package index, and there’s the Python package index, and there’s the package Python package index, and there’s the pip installer tool. And so, the Python package index has the problem that it doesn’t help you with non-Python dependencies, and developers have to do a lot of work in order to package up and deploy their packages as wheels on the Python package index.
And so, for me, like, as a package developer, I’ve had to suffer a lot of hardship in order to get my software, like PyArrow, for example, on—on the Python package index. But, you know, basically, pip and pipenv and Poetry, and there’s, you know, various Python tools, they only work if, like, your dependency stack and what you’re managing is something that lives on the Python package index. And so, if you have R or you have, like, other things that need to be installed for your full application, then—then these kind of pip-centric or Python package index-centric—index-centric tools are not going to work for you. And the same is true of, like, you know, if you have an R-centric application with—with CRAN.
And so, really, you know, we need to—we need to use cross—we need to use cross-language or language-agnostic package management tools that assist with kind of creating these reproducible—reproducible environments. I think Conda has the unfortunate association that people see it as, you know, basically a product of Anaconda, the company. But actually, I mean, nowadays, if you—you know, you can get a non-Anaconda distributed, you know, installation of Conda and Mamba, which is like a—like a faster replacement for Conda.
And there’s a new packaging tool called Pixi. You’re probably saying to yourselves, you know, why do we need another packaging tool? But I’m pretty excited about Pixi for a couple of reasons. So, firstly, it enables you to manage your dependency stack, you know, without having to edit YAML files manually. So, like, you just add a dependency to your stack or remove a dependency, and it manages the YAML files for you, similar to NPM or Cargo, if anyone’s used Rust.
And so, I don’t know that it’s necessarily the answer that you’re looking for, but I would say that trying to, you know, look at things like Pixi or just—or even Conda and trying to see if it’s possible to create, you know, use those tools to manage both your R, you know, your R environments that you want to deploy or your—and your Python environments is one possible approach. There might be other approaches that I—that I’m not aware of, but if I were in that situation, that’s what I would try to do. And then if I can’t get it to work, then I would write an angry blog post saying, like, I tried to do this and I failed. And so, maybe that might motivate the—that might motivate the developers to fix it.
I’ve kind of made a personal, like, promise to myself to, like, never work on a Python packaging tool because it’s like, you know, we’ve all thought about it because you get frustrated with the packaging and say—and you want to say, hey, I’m going to go fix this, but I’ve made a pledge that I will not, like, I will not—that’s, you know, I might build a text editor before I build a Python packaging tool.
Rachael Dempsey: I was trying to make a few associations between some of the questions, and maybe I’ll get it right, maybe I won’t, but Regis, I don’t know. I think your question might be associated with this one, but do you want to jump in?
Regis: Yeah. Also, I usually have—this is my background, and I say this is what data scientists do because we bring data together so that we can enable decisions, but I guess for what I’m talking about right now in my question, this is, like, R, that’s Python, and maybe this is Posit, I guess. That’s the scene at store. Anyway, the question that I’m having, because I know that you’re talking about being able to use the two pieces, but what I’ve been hearing, because I’m from R-land, is that usually when you try to deploy a tool somewhere, you have to spin up a virtual server and then have a Docker image inside of it in order to get the packages in the project to be consistent so that it works the same at home when you deploy as it did when you built it in the factory. But it’s—I mean, you were saying, Wes, that you’ve gotten frustrated with package management stuff before, and you might end up writing an angry blog post or something, but not the tooth or horn of Posit, but it’s really effortless to deploy R projects and do however many iterations you need in Posit team, and I’m just wondering if there is a plan to make it also painless for Python and Python and R together, not using any additional stuff, but just—because it’s a real blocker, because right now if you have to spin up this virtual instance with a Docker image, you have to talk to the IT department or something and have them do all of that, especially if you need everything to be the same. But I don’t want to miss out on everything that’s happening in Python-land. So that’s really what I’m asking.
Wes McKinney: Yeah. No, I think this is—yeah, I don’t have a simple answer for you right now, but I think you’ve hit the nail on the head that, like, this is the kind of problem where, you know, as a data scientist or, you know, as a system developer, you don’t—you know, this is not the thing that you want to have to spend, you know, spend your precious time, like, you know, sweating over. It’s, like, how to, you know, how to make this, like, you know, just work in the production environment.
And so I think it’s important for, you know, in an enterprise setting with, you know, the various controls that are, you know, controls that are in place in the production environment, I think it’s critical to make that process easy for, you know, our only applications, which I think, you know, to your point, I think, you know, Posit has made a big investment in making that seamless and just work for R. And so getting to that point with Python applications or applications that use Python in R, I think that that’s a critical, you know, that’s a critical thing.
So, yeah, I—it’s one of the areas where I’m getting up to, you know, getting up to speed on, you know, getting up to speed on what’s been done while also, you know, trying to stay up to date on all of the, you know, there’s, you know, active work going on in the Python community to improve this, like, you know, deployment and, like, productionization problem. You know, even, like, you know, Python is pretty fussy about site packages. And so, like, that’s still, like, a thing that, you know, like, still a thing that plagues—that plagues users. So it’s great to hear that feedback. But, yeah, something I will definitely be, you know, zooming in on and and trying to help get to, you know, get to a place that where people are happy with how things work.
Regis: One very—thanks for that answer—one very unsatisfying pseudo solution that I’ve kind of thought of is just using Python when I have to use Python, but just keeping it restricted to just those things that I need to use it for and deploying that in an API and then just reaching into that API from within R because you can guarantee that R will work the same, but you can’t always guarantee, especially if, like, the IT department changes dependencies and those propagate and break things or whatever. So, but I know that’s not great or satisfying. So.
Wes McKinney: Yeah, I mean, I think the ideal thing is that you would have, like, a, you know, like, like a package, like, essentially the equivalent of a package lock on Python. So you know that, you know, what you develop and test locally is exactly what you’re getting in the production, in the production environment. And I think, I think some users, I think some users may want to be, you know, even be able to, you know, build, like, build the Docker image, like build a Docker image locally and have some tooling, you know, some tooling around that. So you can say, you know, this, you know, this requirements file, like this base Docker image, like, you know, build this for me so that I can, you know, wire it up with my R application and use it through, you know, reticulate or, you know, through a, you know, through a web API to know that, like, yeah, because just to have something that works locally and then you deploy it and then, like, maybe it works initially, but then it gets broken because of something that was out of your control. That’s not, that’s, that’s no fun. Yeah.
Regis: So just to re, I know I’m taking too, too much time, but I’m just, I’ll give you, I’ll give you 10 more seconds here. I’m grateful. I’m grateful that you’re going to be working on that because the Docker thing doesn’t really fly. It’s not, it’s, you can’t ask the IT department every time you need a new Docker image, especially for the stuff we do as data scientists. But yeah, thanks. I’ll turn off my camera.
Rachael Dempsey: Thank you, Regis. Okay. I see, Abigail, you had a question about some of the lessons Posit is learning from Python users. Want to ask that here?
Abigail: Yeah, thanks. So I feel like Quarto and, and Shiny are the two big projects. I’ve used both of them. I went back and forth a lot with a very patient Posit developer on why my Python build was not compatible. And I think he was a little taken aback by, like, how many rounds back and forth we were doing and, like, how broken my Python build was, whatever, whatever. So I feel like that’s one of the issues. Like, in general, like, what are the things that Posit has kind of taken away from, like, those, like, both fairly big tools and, you know, trying to pull in Python users and, like, how has that gone? Any, any lessons learned?
Wes McKinney: I think, I mean, I think Quarto is definitely, you know, like a great example of, if the project started out, I think the project started out, like, not, not working super closely with the, super closely with the Jupyter, the Jupyter ecosystem. And, you know, at some point while the project was still, you know, a little bit flying under the radar, you know, it, they kind of reworked, you know, reworked things to, you know, make it something that works really well for that, like, Jupyter centric user.
And so, for example, there were, you know, there were people working on, like, Jeremy Howard is one example of somebody who was a very Python centric user, you know, working on deep learning, and he wanted to write his book in Jupyter notebooks and then convert it into a Quarto website, but also generate DocBook XML for O’Reilly Media. And I also, you know, my book, like, I, you know, got in touch with, got in touch with JJ and the Quarto developers about, you know, it’s like, I would like to publish this book online and, you know, can you help me as Quarto, something that can help.
And so I think that, you know, to understand, you know, the, the, the R community and the Python community, I think, are working in different, working, you know, from different starting places in those, in the ecosystems have, like, developed organically in, in different ways.
And I think another, another thing that I often think about is the fact that the Python ecosystem on a relative basis, like, has a lot, like, there’s a, like, a much wider spectrum of types of, like, types of users, like, what they’re, you know, the, like, the type of work that they are doing, like, the types of applications that they’re building. And so I think in, in the R world, you know, again, the generalization is not, you know, not 100% valid, but I would say that, you know, the plurality of people, you know, working in R doing, you know, data science, data analysis, statistics, you know, machine learning. And in the Python world, you have a lot, you have a lot more users that, that are doing some, some data work using some of these tools. But those, that data analysis work might be incidental to some other, you know, aspect of their jobs, some other job responsibilities. And so, and that might be more software engineering, or building, you know, production applications, or building web services, or, you know, there’s many different things. Obviously, you can do those things in R too, but, you know, in terms of how, like, these teams have developed inside companies. Yeah, it’s, it’s, you know, the teams, you know, the teams, and the people, and the teams end up looking, you know, fairly different.
I think that it’s, it’s great that, that we have this opportunity for, for cross-pollination, both in the open source libraries. I think the Python community has learned a lot from the R community in terms of API design, usability, user experience. And I’ve tried to incorporate, like, lessons from the Tidyverse, and from dplyr, and from ggplot2, and to the things that, in the things that I’ve built. And, you know, I think also, you know, the R community has learned from, has learned from things that the, learned from things that the Python community has, has done also.
And so, I think, yeah, I think there’s, I think there’s, you know, there’s enormous opportunity, and the fact that we have, like, like, a lot more active, and more healthy dialogue, and that it is, it’s, we’re not hearing as much about, like, the language wars of, like, you know, Python sucks, or R sucks, or, like, R is not a language, like, I haven’t seen one of those blog posts in a while. And so, I think getting past the, like, red team, blue team, kind of, you know, bickering about who’s programming language is better, has enabled us to have, like, just a lot more productive conversations about how we can build, you know, real tools for humans that are usable and accessible, and how we can basically make the tent bigger, and bring more.
Because, you know, really, the challenge that I see, and that I think others see as well, is that we’re not working, you know, R versus Python, we’re really, like, open source versus, you know, closed source proprietary software. And so, kind of expanding the tent of, for open source, and making open source, like, the preferred and attractive way for businesses to go forward, that is, I mean, that is our main challenge.
And so, if it’s fine, you know, if you have a bunch of open source software, and an individual can get up and running, but as soon as they want to go do that at work, and they find that they’re running into, you know, tons and tons of roadblocks, or, you know, there’s a lot of not so fun stuff that needs to be needs to be built, you know, to take free software that you can download on the internet, or pull from GitHub, and make it work, make it work in a business setting. So, I think we’ve made great progress, but still a long, long road to go.
Rachael Dempsey: Thank you. Thank you, Wes, I love that, real tools for humans. I see Russell put that into the chat as well. I see Libby, you had a question a bit earlier, want to jump in here next?
Libby: Um, sure. It was a long time ago. Let me see. I can pull. Ah, okay. So, yeah, I’m just wondering, as a person who has been sort of a driving force behind a lot of packages, and platforms, and things that have really wide visibility, which most of us on this call do not do, and have not experienced, do you feel like you had to develop a thick skin, speaking of Red Team versus Blue Team, and like all of the feedback, and the opinions that you get from all directions, or that, you know, we see from all directions? And if you did have to develop a thick skin, how did you go about that?
Wes McKinney: I did. I will say that I didn’t, I don’t think that I always handle the negative, or, you know, petty feedback with grace. I, you know, I can think back on times where, you know, I became sort of upset, or, you know, had an emotional reaction to something that somebody said.
I remember, you know, one time, Jeff Reback, who’s now got more contributions to pandas than I do, and he was corresponding with, you know, somebody, an issue reporter on GitHub, and the person asked him if he was like a QA developer, or a real developer. And so, like, you know, there are all these, like, little comments that, you know, comments that needle, or that people, you know, people complain about things, and, you know, maybe they’re missing, you know, they’re missing the bigger picture, or they don’t realize that, like, yeah, it’s imperfect, but you already worked really hard on what’s there, and you feel like they don’t appreciate what you have built, and they’re only focused on, like, what’s wrong with it.
And so, I think the feedback that you get in an open-source project tends to only be negative feedback. Like, sometimes you get people telling you, thank you, like, you know, thanks for building this thing. It’s great. I use it. It’s great. But mostly, you get the negative feedback people engage when there’s something that they don’t like, or something that doesn’t work, or something that’s missing.
And, you know, it is tough. It is really tough. In terms of building a thick skin, yeah, I think, you know, I don’t know if anybody on this call has ever listened to or read David Foster Wallace’s This Is Water speech. It was, like, a commencement speech, you know, from, like, 2005 or something. And it encourages you to, like, be more empathetic or, like, more compassionate about, like, the subjective experience of other people. And so, you know, when you get this negative feedback, like, you have to realize that, like, it isn’t always about you or, like, a criticism of you. Like, it may actually be that that person is just having, like, a tough time. They’re having a bad day. Maybe they’ve got, you know, they’ve got, everybody’s got stress. Like, everybody’s got difficulties in their job or difficulties with their family.
And so, you know, whenever you’re, you know, whenever you’re seeing this feedback, you know, you’re, you know, you don’t have that context of, like, that person and, like, what they’re, you know, what they’re dealing with in their life. And so, I think you have to kind of, you have to take the negative feedback with a grain of salt and realize that, you know, when somebody’s, you know, annoyed with your software not doing what they wish it did or being perfect in the ways that they wish for it to be perfect, that, you know, that that comment is as much about them as it is about you. And so, you have to, you know, try not to take it too personally.
I do not read my Amazon book reviews. And so, that was, like, I just decided not to do that. And, yeah, and I, you know, I try not to Google myself. And I try not to look at, like, I don’t, I rarely look at my, look at my, you know, my Google results. I don’t look up things on, you know, Twitter. I guess it’s X now. But, yeah. So, social media is, you know, not a good place to get feedback. But, you know, every now and then there’ll be comments on GitHub that hurt. And, yeah, and over time, like, it starts to hurt less. And then, after a while, you become numb to it. I think Hadley is, like, totally numb to any sort of feedback. And even to the point where, you know, yeah, he doesn’t mind, you know, being too blunt with users. It’s like, you know, I’m sorry you don’t like that, but I’m not going to fix it, basically.
Rachael Dempsey: I love that, Wes. Thank you. Thank you so much, Wes. It’s just reminding me of the episode of Ted Lasso I just watched last night, too. I wish Deep Show was here for that. Let’s see. Okay, Dan, I see you had to ask the question a little bit earlier here about LLMs and generative AI. Do you want to jump in?
Dan: Yeah, sure. Thanks, Rachel. Just curious, Wes, to hear your own kind of top-of-mind views on how Posit intends to position itself around generative AI and LLMs, maybe in particular, you know, thinking about all the different ways that LLMs obviously rely on data, even though they’re not always transparent. But, you know, what are your thoughts around how Posit can meaningfully position itself as a tool for developing generative AI for testing and or monitoring generative AI responses and so on?
Wes McKinney: Yeah, I will preface what I’m saying by saying that there, yeah, I think are aspects of this that are – I’m not sure what the phrase is – above my pay grade. But I do know that, you know, co-pilot is a big deal for developer productivity and, you know, making, you know, co-pilot available and, you know, easy to use and, you know, for people to opt in and use and, you know, kind of using the LLM features in a way that helps people be more, you know, be more productive and use the tool in the way that it’s best, you know, it’s best to use, which is, you know, you can use co-pilot to generate unit test cases for you and help with repetitive tasks that you might otherwise like have to do a bunch of typing and maybe make a lot more errors than co-pilot would make, you know, would make the first time.
Obviously, there’s, you know, there’s copyright, you know, copyright concerns and sort of IP, you know, IP concerns and, you know, other concerns about, you know, misuse of, you know, misuse of LLMs. And, you know, I do – yeah, I am, you know, concerned about, you know, about, you know, the ethical, like ethical use of LLMs and kind of the potentially harmful effects, you know, on society.
I think the downside of, you know, the downside of making developers more productive is in the future we need fewer developers, most likely, and so that does have, like, a negative, you know, negative effect on the workforce, you know, but you would hope that at some point, you know, as we all become more productive that we, you know, that we can work less. I think that’s always been, like, the sort of the dream of, like, increased productivity in the future means we don’t have – we all don’t have to work as hard, but often, you know, the pointy-haired bosses, you know, would prefer that we do twice as much work in the same amount of time with better tools.
But so I don’t know if that answers your question, but I think, you know, integrating – I think integrating Copilot into – into Posit’s product offerings is, you know, I think that’s table stakes in 2024, and so we can have LLM-assisted – LLM-assisted data exploration, development, testing, and, yeah, I think – I’m interested in, like, other sorts of LLM-assisted, like, workflows around data, like, actual data analysis, like, helping you, like, actually ask better questions about the data. I think that’s one interesting area of research. So not just, like, you know, how do I make – you know, show me how to make this plot with ggplot2, but actually, like, what are some other way – what are some other ways – questions that I could ask are, like, ways that I could look at this data that might bring more, like, more insights. I think that’s a pretty interesting – an interesting area, kind of like a LLM-powered – LLM-powered, like, research assistant kind of thing.
Dan: Yeah, no, I think you’re talking about ways in which LLMs can help make, you know, R or Python use more productive. I was almost thinking the opposite, like, how Posit as a form of IDE or development workspace could help in the creation and augmentation of, say, open-source LLM models. And maybe it’s not the right tool for that. You know, maybe other tools are really more well-suited to that use case, which is fine too, just – but maybe in the future there’s a place for Posit in that – in that space.
Wes McKinney: Yeah, I mean, I know that – I mean, the primary language of doing LLM research and implementations is, you know, is Python. And, I mean, I think there’s some, you know, C++ development and probably some, you know, Rust development here and there. But I think to the extent that, you know, I think we’re, like, we’re building development environments that assist teams that are doing LLM development or collaborating on LLM development, I don’t know that we have any specific plans or at least specific plans that we’re able to talk about around, like, tooling to support LLM model development specifically. But I think – I do think it’s an interesting and important area. And given that the, you know, the Python world in particular has played, like, a pretty essential role in, like, being the – kind of the canvas for where a lot of these tools are being developed, I think as a company, you know, we should, you know, try to – you know, we should look at – we should look seriously at, like, how we can, you know – yeah, how we can help or at least, like, what we can reasonably – what we can reasonably do to, like, you know, help with the LLM model development and the collaboration process around it.
Rachael Dempsey: Thank you. Thank you. I want to make sure I don’t forget to go over to some of the anonymous questions on Slido. And one that was asked over there was, in your opinion, what’s the best way to start in Python?
Wes McKinney: Yeah. If you’ve never worked in – if you’ve never worked in Python before, I guess my advice is always, you know, to find something concrete that you would like to do with Python. It could be something, like, really mundane, like, you know, if you’re curious about, you know – I don’t know, like, what you’re spending your money on at Amazon or something like that. I get really curious about my personal finances, so – but find some problem that’s relevant to your life.
And I think there’s a number of great books that help with, you know, learning Python from a beginner standpoint, like, you know, Automate the Boring Stuff with Python. I know the author of that book, and, you know, there’s – you know, there’s some other kind of, you know, introductory Python books just for the Python language.
I have a book that’s now in its third edition called Python for Data Analysis that has – if you’re, you know, interested in learning about data analysis and data science, it has, like, kind of a quick start in Python. It doesn’t go into object-oriented development or building, you know, serious software in Python. But if you’re looking to learn just enough Python to use pandas and get up and running with, you know, Jupyter Notebooks and work with data, I think it’s a great resource for that. And so I wrote the book intending to – for it to be, like, a quick start for somebody who has some basic programming experience but wants to use Python for kind of data analysis, data science.
Rachael Dempsey: Thank you. Let’s see. Russell, I see you just asked a question in the chat about advice for building a company. Want to jump in?
Russell: Sure. Great presentation. Thank you. As someone who has built several really successful companies once, do you have any recommendations for people who would like to build companies, particularly data science companies?
Wes McKinney: Yeah. It’s very difficult. My route to building companies has been, you know, definitely different from, you know, from a lot of people because I’ve been – I’ve really focused on the technology and then kind of retrofitting, like, a business model and a corporate structure, like, around, like, in support of the open-source projects.
And so, for me, the process has been build an open-source software project, build critical mass, start engaging with users of that project, and then learn from those users, like, well, outside of the open-source project and solving the kind of low-level technology problems, like, what are the next set of problems that need to be solved, you know, around that?
So, it could be just, like, in the case of Voltron Data, for example, like, the first product that we launched was, like, an enterprise support and open-source partnership program for Apache Arrow, because we recognized that there were businesses that were incorporating Arrow into their own products and systems, and they needed to have a reliable partner and a private channel to discuss issues that they encountered using the open-source software in their internal development, and to create a structure where we could align on development initiatives within the open-source project that they wanted to put funding behind, but in, like, a more structured way, where, like, there’s a contract and, you know, like, kind of deliver the product, where, like, there’s a contract and, you know, like, deliverables and timelines and, you know, like, concrete resourcing and all of the, you know, kind of things that you would need around, like, a commercial, you know, kind of enterprise contract.
I think that other people start companies that are more, you know, they start with identifying, like, identifying a business problem and conceptualizing a product that solves that problem, and so kind of starting more, like, the product on down, and you say, okay, given what the product needs to do, and so, like, the technology that you build to create, to deliver that product ends up being, like, an incidental, something incidental that, you know, in a sense, like, it’s like the iceberg thing, like, there’s this bit of ice that you see above the water, like, that’s the product, but then there’s all this other stuff that you build underneath, and so from the user standpoint, you know, they don’t need to know that much about, you know, how it’s built, just that it works.
I haven’t built too many of, like, those kinds of companies, but I’ve seen many people, you know, work on them, so I think learning from, you know, learning from your users, you know, being open-minded, you know, have strong opinions, but loosely held opinions, so be willing to change your mind and learn from feedback, and I think, you know, I’ve benefited greatly from mentorship and help from many others, you know, from the generation, you know, the generation above me. I’m 38. I’ve been doing entrepreneurial things for, you know, the past 11 or 12 years. You know, my 20s, I benefited greatly from folks who, you know, had, you know, 10 or 20 years of experience on me in entrepreneurship and in open-source software, and so, you know, I learned a lot from people in the Python community around open-source community building, project development, culture, and, you know, I think without that mentorship and that helping help from others who’d been on the path, you know, before me, I would have, you know, not, yeah, it would have been harder for me to get where I am now, so standing on the shoulders of giants, certainly.
Rachael Dempsey: Excellent. Thank you. Thank you. I guess while we’re on this topic of advice for people starting a company, Wes, what is one of the most memorable pieces of career advice you’ve either received or maybe given throughout your professional journey?
Wes McKinney: I would say for me, yeah, probably the most frequent advice that I give to other people is to focus on, like, people and relationships, so investing in your relationships with people that you want to work with for a long time and also viewing your working relationships with a long-term mindset, not just, like, you know, what value do we have to each other right now in the project that we’re working on together, but also, like, what, you know, what value might these working relationships have, you know, have in the future.
And so for me, both in an entrepreneurial setting but also in an open-source software setting, I’ve taken that mindset in how I interact with other open-source developers from this kind of long-term relationship-building standpoint, so it might be that, you know, there’s no opportunity to collaborate with somebody or to work with somebody right now, but there might be in five years, and so it’s kind of this balancing act of, like, how do you, you know, how do you spend your time or, like, who do you decide who to collaborate with and, like, where to spend, you know, your, you know, eight or ten hours of working hours in a day, but, you know, I think finding people that you like working with, people that you feel inspired and productive around, like, who, you know, help to generate ideas and helps you feel kind of motivated and productive about what you’re working on, and, you know, and finding and cultivating those relationships and people that you want to work with for a long time.
And so, you know, I have people that I’ve worked with, you know, actively off and on for over a decade, and I treasure those relationships, and so I think I’ve definitely seen other people who, you know, are working in a more, you know, transactional mindset and thinking more short-term about, you know, their relationships or treating, you know, kind of people as more kind of interchangeable parts, you know, who, you know, serve kind of a short-term purpose to achieve, like, some, you know, business goal or some, you know, complete some task that you have in front of you, but, you know, kind of taking a more people and human-centric, like, mindset towards that, I think, for me at least, has been, yeah, has been a lot more rewarding and I think has been, you know, more valuable long-term, and so I always encourage other people to, you know, to the extent that they’re able to take that approach as well.
Rachael Dempsey: Absolutely, that’s great. Thank you. I know this is always one of the quickest hours for us of the week, so I will be sure to go through and check out any questions that we have missed and try and collect those, but I did just want to add or ask one more question. We are going to be in Minneapolis as a company for Companywide Workweek, and so some people had submitted questions ahead of time for a panel that we’re having there, and so one, and you don’t have to cram maybe all of this into the two minutes here, so maybe let me make it more generic. What are you most excited about at Posit in the year ahead, and perhaps about your vision for Python work at Posit going forward?
Wes McKinney: Well, that’s a big question. I personally, like, I’m, you know, I think this is, for me, this has been, like, one of the first opportunities I’ve had to, in the last several years, to get my head above water and take a more, like, a more, like a, you know, kind of my data science walkabout, you know, so to speak, and learn more about what everybody else has been building and doing in the last, you know, five or six years where I’ve been, like, I, you know, have really had my head down and been really focused on the Arrow project and its ecosystem.
And so, you know, I’m, you know, really impressed with everything that I’ve, everything that I see that’s, you know, that’s been built, you know, in the last few years around, you know, developer tools and productivity and environments and, you know, tooling that supports all these different types of work.
And so, yeah, I think, you know, I’m also looking, you know, sort of thinking longer term about, you know, like, like, what are my next big projects going to look like and, you know, like, what else could I work on aside from continuing to nudge along, you know, the projects that I’ve worked on in the past, you know, to help reach kind of the next, you know, kind of the next plateau of, like, progress and growth in the ecosystem, so I’m in a, you know, I’m in a learning mode.
I think that, you know, I think that the people that I’m working with at Posit, you know, it’s a very stimulating environment and a place for me to learn and, you know, come up with new ideas, and so, yeah, I’m really excited to see what that, you know, what that holds for me personally and also, you know, for the company and for the data science ecosystem more generally.
Rachael Dempsey: Thank you so much, Wes. I really tried to cram a few questions into that last one, and you did a great job there. I really appreciate you joining us to share your experience. This has been great.
Wes McKinney: Thank you all. Thanks, everybody, for hanging out with me for an hour. I enjoyed it, so thank you.
Rachael Dempsey: Yeah, thank you all so much for joining us today. I know there were a few people joining for their first time ever today for the Hangout, so I did just want to let people know, again, we do have these every Thursday at 12 Eastern Time. If you use the short link I just shared in the chat, you can add them to your calendar. Come whenever it fits your schedule. I’m so excited to go back and read through this chat and all the resources and everything that people shared. If you did want to save the chat for yourself, the three dots below it in Zoom, you can just open that up to save the chat there too. Thank you all so much for the great questions and everything. I really appreciate it. Have a great rest of the day.