Andrej Karpathy's YC talk

This is a transcript of Andrej Karpathy's talk on Software 3.0 on June 17th. I was in the audience, and it was recorded in a noisy environment - set your expectations accordingly.

YC has said the official video will take a few weeks to release, by which Karpathy himself agrees the talk will be deprecated. https://x.com/karpathy/status/1935077692258558443

YC sped up releasing the official video - the official talk is out now! Go watch that instead. Karpathy found some errors in this transcript, and the talk is (obviously) immensely better in his words

Andrej Karpathy: Software Is Changing (Again)

I think it's actually an extremely unique and very interesting time to enter the industry right now. And I think fundamentally the reason for that is that software is changing. Again. And I say again because I actually gave this talk already. But the problem is that software keeps changing, so I actually have a lot of material to create new talks. And I think it's changing quite fundamentally. I think broadly speaking, software has not changed much at such a fundamental level for 70 years. And then it's changed, I think, about twice quite rapidly in the last few years. And so there's just a huge amount of work to do, a huge amount of software to write and rewrite.

So let's take a look at maybe the realm of software. So if we're going to think of this as like a map of software, this is a really cool tool called Map. This is kind of like all the software that's written. These are instructions to the computer for carrying out tasks in the digital space. So if you zoom in here, these are all different kinds of repositories. And this is all the code that has been written. And a few years ago, I kind of observed that software was kind of changing and there was kind of like a new type of software around. And I called this Software 2.0 at the time. And the idea here was that Software 1.0 is the code you write on the computer. Software 2.0 are basically neural networks. In particular, the weights of the neural network. And you're not writing this code directly. You're more like tuning the data sets and then you're running an optimizer to create the parameters. And I think at the time, neural networks were kind of seen as just a different kind of classifier, like a decision tree or something like that. And so I think this framing was a lot more appropriate.

And now actually what we have is kind of like an equivalent of Git in the realm of Software 2.0. And I think the Hugging Face is basically an equivalent of GitHub in Software 2.0. And there's also a model atlas and you can visualize all the code written there. In case you're curious, by the way, the giant circle, the point in the middle, these are the parameters of Flux, the image generator. And so any time someone tunes a LoRA on top of a Flux model, you basically create a git commit in this space and create a different kind of image generator. So basically what we have is Software 1.0 is the computer code that programs the computer. Software 2.0 are the weights which program neural networks. And here's an example of AlexNet image recognizer neural network. Now so far, all of the neural networks that we've been familiar with until recently were kind of like fixed-functional computers. Image categories or something like that. And I think what's changed, and I think it's a fundamental change, is that neural networks became programmable with large libraries. And so I see this as quite new, unique. It's a new kind of computer. And in my mind, it's worth giving it the designation of a Software 3.0. And basically your products are now programs that program people all over. And remarkably, these products are written in English. So it's kind of a very interesting programming language.

So maybe to summarize the business, if you're doing central classification, for example, you can imagine writing some Python to basically do sentiment classification. Or you can train neural networks. Or you can drop a large amount of code. So here I'm just using SoftPrompt, and you can imagine changing it and programming the computer's life. So basically we have Software 1.0, Software 2.0. And I think we're seeing, I mean, you've seen a lot of GitHub code is not just like code anymore. There's a bunch of English interspersed with code. And so I think there's a growing category of that kind of code. So not only is it a new programming paradigm, it's also remarkable to me that it's in our native language of English. And so this blew my mind a few years ago now. I tweeted this, and I think it captured the attention of a lot of people. And one of the things that I currently pinpoint to them is that arguably we're not programming computers in English.

Now when I was at Tesla, we were working on the Autopilot, and we were trying to get the car to drive. And I sort of showed this slide at the time where you can imagine that the inputs for the car are on the bottom, and they're going through the software stack to produce the steering and acceleration. And I made the observation at the time that there was a ton of C++ code around in the Autopilot, which was the software model development. And that there were some neural nets in there doing the interactions. And I kind of observed that over time as we made the Autopilot better, basically the neural network grew in capability and size. And in addition to that, all this C++ code was being completed. And a lot of the capabilities and functionality that was originally in 1.0 was migrated to 2.0. So as an example, a lot of the stitching up of information across images from the different cameras across time was done by neural nets, and we were able to delete a lot of code. And so the software development stack was quite literally made through the software stack of the Autopilot. So I thought this was a brilliant model at the time. And I think we're seeing the same thing again, where basically we have a new kind of software, and it's being through the stack, made through a completely different programming paradigm.

And I think if you're entering the industry, it's a very good idea to be fluent in all of them. Because they all have slight pros and cons, and you may want to program some functionality in 1.0 or 2.0, or 3.0, or you're going to train in LLM, or you're going to just run from LLM, that shouldn't be any software that's explicit, etc. So we don't have to make these decisions to actually potentially fluidly transition to LLMs.

So what I want to get into now is, first I want to, in the first part, talk about LLMs, and what I think of this new paradigm in the ecosystem, and what that looks like. What is this new computer? What does it look like? And what does the ecosystem look like? I was struck by this quote from Andrew, actually many years ago now, I think. And I think Andrew is going to speak right after me. But he said that the term AI is probably not his thing. And I do think that it captures something very interesting, in that LLMs certainly feel like they have properties of utilities, right? So, LLM labs, like OpenAI, Gemini, Fungi, etc, they spend time to train the LLMs, and this is kind of equivalent to a built-in AI algorithm, and then there's op-ecs to serve them intelligence over APIs to all of us. And this is done through internet access, where we pay per million tokens or something like that, and we have a lot of demands that are very utility-like demands out of this API. We demand low latency, high uptime, etc.

In electricity, you would have a transfer switch, so you can transfer your electricity source from, like, grid, solar, or battery, or generator. In LLMs, we have maybe open router, and easily switch between the different types of LLMs that exist. Because the LLMs are software, they don't need more physical space, so it's okay to have, basically, like six electricity providers that you can switch between, right? Because they don't need this directly. And I think what's also really fascinating, and we saw this in the last few days, actually, a lot of the LLMs went down, and people were kind of stuck and unable to work.

And I think it's kind of fascinating to me that when the state-of-the-art LLMs go down, it's actually kind of like an intelligence brownout in the world. It's kind of like when the voltage is unreliable on the grid, and the planet just gets smaller. The more reliance we have on these models, which already is, like, really dramatic, and I think will continue to grow. But LLMs don't only have case of utilities. I think it's also fair to say that they have some power tools called CAPs. And the reason for this is that the CAPs required for building LLMs is actually quite large. It's not just like building some power station or something like that, right? You're investing a huge amount of money, and I think the technology is growing quite rapidly. So we're in a world where we have, sort of, deep tech trees, research and development, secrets, that are centralizing our shiny LLM labs. But I think the analogy varies a little bit also, because, as I mentioned, this is software. And software is a bit less expensive, but because it is so valuable. And so I think that's just an interesting kind of thing to think about.

There's many analogies you can make, like a formatted data process, and maybe, for instance, something like a cluster with certain max-ops. And you can think about, when you're using it, you actually use it only through the software, not through the hardware. That's kind of what the CAPs are. But if you're actually also building it on hardware, and you're sharing it using Google, that's kind of like an integral on your platform. So I think there's analogies here that make sense. But actually, I think the analogy that makes the most sense, perhaps, is that, in my mind, LLM's have very strong analogies to operating systems, in that this is not just electricity or power. It's not something that comes automatically to happen. It's a commodity. These are now increasingly complex software ecosystems. So they're not just simple commodities like electricity. And it's kind of interesting to me that the ecosystem is shaping in a very similar kind of way, where you have a few closed-source providers, like Windows and macOS, and then you have an open-source alternative, like Linux. And I think for LLM's as well, we have a few competing closed-source providers. And then maybe the LLM ecosystem is currently maybe a close approximation to something that may grow into something like Linux. Again, I think it's still very early, because these are just simple LLMs. And I'm starting to see that these are going to get a lot more complicated. It's not just about the LLM itself, but it's about the tool use, the full-time analogies, how all that works. And so when I sort of have this realization about that, I try to sketch it out. And it kind of seems to me like LLMs are kind of like a new operating system, right? So the LLM is a new kind of computer. It's kind of like a CPU equivalent. The context windows are kind of like the memory. And then the LLM is orchestrating memory and compute for problem solving using all these capabilities. And so definitely, if you look at it, it looks very much like software. from that perspective.

A few more analogies, for example, if you want to download an app, say I go to VS Code and I go to Download, you can download VS Code and you can run it on Windows, Linux, or Mac, in the same way as you can take an LLM app, like a cursor, and you can run it on GBT, or Blob, or JPEG streams, right? It's just a drop-in. So it's kind of like similar in that way as well. The more analogies that I can describe to you is that we're kind of like in this 1960s-ish era, where LLM compute is still very expensive for this new kind of a computer, and that forces the LLMs to be centralized in the cloud, and we're all just sort of fake clients that interact with it over the network, and none of us have full utilization of these computers. And therefore, it makes sense to use timesharing, where we're all just, you know, at the mission of the batch, when they're running the computer in the cloud. And this is very much what the computers used to look like. During this time, the operating systems were in the cloud, everything was streamed around them, and they were batching. And so the personal computing revolution hasn't happened yet, because it's just not that common, and it doesn't make sense, but I think some people are trying. And it turns out that Mac minis, for example, are a very good fit for some of the LLMs, because it's all purely batch-run inference, this is all super-memory-bound, so this actually works. And I think these are some early indications that you have personal computing, but this hasn't really happened yet, and it's not clear what this looks like. Maybe some of you guys are interested in talking about what this is, or how it works, or what it should be.

Maybe one more analogy that I'll mention is, whenever I talk to ChartsQt or some LLM directly in text, I feel like I'm talking to an operating system through the terminal. Like, it's text, it's direct access to the operating system, and I think a GUI hasn't yet really been invented in a general way, like ChartsQt had a GUI, but different than just a text box. Certainly some of the apps that we're going to go into in a bit have GUI, but there's no GUI across all the tasks and devices. There are some ways in which LLMs are different from operating systems in some kind of unique way, and from early computing. And I wrote about this one particular property that strikes me as very different. It's that LLMs, they flip the direction of technology that is usually present in technology. So for example, with electricity, the economy is getting quite expensive, and lots of new transformative technologies have not been around, particularly in government-owned corporations that are the first officers, because it's new and expensive, etc. And only later did they fix it to consumers. But I feel like LLMs are kind of what flipped around. So maybe with early computers, it was all about ballistics and military use, but with LLMs, it's all about up how do you boil an egg or something like that. This is certainly like a lot of minds.

And so it's really fascinating to me that we have a new magical computer, and it's like helping me boil an egg. It's not helping the government to do something really crazy like some military ballistics or some special technology. Indeed, corporations are now getting the lagging, apparently not, of all of us, of all these technologies. So it's just backwards. And I think it informs me in some of the uses of how we want to use this technology, or like where I saw the first apps in the store.

So in summary so far, LLMs, app LLMs, I think it's accurate language to use. LLMs are complicated operating systems. They're circa 1960s computing, or we do computing already. And they're currently available via time-sharing and distributed like a utility. What is new and unprecedented is that they're not in the hands of a few governments and corporations. They're in the hands of all of us, because we all have a computer, and it's all just software. And Chachapitibos leans down to our computers that collect billions of people like this and play it overnight. And this is insane. And it's kind of insane to me that this is the case. And now it is our time to enter the industry and program these computers. It's crazy.

So I think this is a better part. Before we program LLMs, we have to kind of like spend some time to think about what these things are. And I especially like to talk about their psychology. So the way I like to think about LLMs is that they're kind of like little spirits. They are stochastic simulations of, and the simulator in this case happens to be an autoregressive transformer. So a transformer is a neural network. And it just kind of like goes from level of focus, it goes chunk, chunk, chunk, chunk, chunk. And there's an almost equal amount of compute for every single chunk. And this simulator, of course, is just, it's basically there's some weights involved and we fit it to all the text that we have on the internet and so on. And you end up with this kind of a simulator. And because it is trained in humans, it's got this emergent psychology that is human-like.

So the first thing you'll notice is, of course, LLMs have this type of deep knowledge and memory. And they can remember lots of things, a lot more than any single individual can because they've read so many things. It actually kind of reminds me of this movie Rain Man, which I actually really recommend people watch. It's an amazing movie. I love this movie. Dustin Hoffman here is an autistic sponge who has almost perfect memory. So he can read like a notebook and remember all of the names and phone numbers. And I kind of feel like LLMs are kind of like very similar. They can remember shock hashes and lots of different kinds of things, very, very easily. So they certainly have superpowers and stuff in some respect, but they also have a bunch of, I would say, cognitive deficits. So they hallucinate quite a bit and they kind of make up stuff and don't have a very good sort of internal model of self-knowledge, but not sufficient at least. And this has gotten better, but not perfect. They display jagged intelligence. So they're going to be superhuman in some problem-solving ways, and then they're going to make mistakes that basically no human will make, like, you know, it will insist that 9.11 is greater than 9.9, or that there are two 'r's in strawberry. These are some famous examples. But basically there are rough edges that you can trip on. So that's kind of, I think, also kind of cool. They also kind of suffer from internal brain amnesia. So, and I think alluding to the fact that if you have a coworker, which is your organization, this coworker will, over time, learn your organization and they will understand and gain, like, a huge amount of context on the organization and they go home and they sleep and they consolidate knowledge and they build expertise over time. LLMs don't natively do this. This is not something that has really been solved in R&D with LLMs by them. And so context windows are really kind of like a working memory that you have to sort of program the working memory quite directly because they don't just kind of, like, get borrowed by people. And I think a lot of people get tripped up by analogies in this way. In popular culture, I recommend people watch these two movies, Memento and First Dates. In both of these movies, the protagonists, their ways are mixed, and their context windows get wiped every single morning. And it's really problematic to go to work or have relationships when this happens. And this happens to a lot of us all the time.

I guess one more thing I would point to is security-related limitations on the use of LLMs. So, for example, LLMs are quite vulnerable. They are susceptible to conflict-injection risks. They might leak your data, etc. And there's many other considerations security-related. So, basically, long story short, you have to simultaneously think through this superhuman thing that has a bunch of cognitive deficits and issues. How do we enhance them? They are extremely likely usable. And so how do we program them? And how do we work around their deficits and toy with their superhuman powers?

So what I'm going to switch to now is talk about the operators. How do we use these models? And what are some of the biggest operators? This is not a comprehensive list of some of the things that I thought were interesting in this stuff. The first thing I'm kind of excited about is what I would call partial autonomy apps. So, for example, let's work with the example of coding. You can certainly go to chat.gt directly, and you can start copy-pasting code around, and copy-pasting awkward words and stuff around, and then code, and copy-pasting everything around. Why would you do that? Why would you go directly to the operators? It makes a lot more sense to have an app dedicated for this. And so I think many of you use Cursor. I do as well. And Cursor is kind of like the thing you want instead. You don't want to just directly go to the chat.gt. And I think Cursor is a very good example of an early LLM app that has a bunch of properties that I think are useful across all the LLMs. So in particular, you will notice that we have a traditional interface that allows a human to go in and do all the work manually, just as before. But in addition to that, we now have this LLM integration that allows us to go in bigger chunks. And so some of the properties of LLM apps that I think are shared are useful. Number one, the LLMs basically do a ton of good context management. Number two, they orchestrate multiple calls to LLMs. So in the case of Cursor, there's under-the-hood eventing models for all your files, the actual chat models, models that apply ifs to the code, and this whole orchestra is for you. A really big one that I think also may be not fully appreciated in all this is application-specific and the importance of it. Because you don't just want to talk to the operating system directly in text. Text is very hard to read, interpret, understand, and also you don't want to take some of these actions natively in text. So it's much better to just see the bit as like red and green change, and you can see what's the matter of subtracting. It's much easier to just do command Y to accept or command N to reject. You don't have to type it in text. So it really allows the human to audit the work of these fabulous systems and to grow faster. We're going to come back to this in a little bit later as well. And the last kind of feature I want to point out is that there's what I call the autonomous slider. So for example, in Cursor, you can just do calculation. You're mostly in charge. You can select a chunk of code and command K to change a static chunk of code. You can do Command L to change this entire file, or you can do Command I, which just, you know, for better or worse, it packages up a lot of things, makes sure that it orchestrates people all at once. It's got a GUI that allows you to audit some of its work. So, for example, it will cite sources that you can imagine inspecting them, and it's got an autonomy slider. You can either just do a quick search, or you can do research, or you can do deep research and come back to all this later. So this is all just very well-structured, automated, and optimized.

So I guess my question is, I feel like a lot of software will become partially autonomous. And I'm trying to think through, like, what does that look like? In fact, many of you maintain products and services. How are you going to make your products and services partially autonomous? Can an LLM see everything that a human can see? Can an LLM act in all the ways that a human can act? And can humans supervise and stay in the loop of this activity? Because, again, these are allowable systems that aren't yet perfect. And what does the diff look like on Photoshop? There's a lot of things we don't know. And also, a lot of the traditional software right now has all these switches and all this kind of stuff. It's all designed for people. All this has to change and become accessible to LLMs.

So one thing I wanted to stress with a lot of these LLM apps that I'm not sure gets as much attention as it should, is LLM. We're now kind of, like, cooperating with the apps. And usually, they are doing a generation, and we assume this sort of verification. It is in our interest to make this move as fast as possible, so we're getting a lot of work. There are two major ways that I think this can be done. Number one, you can speed up verification a lot. And I think GUIs, for example, are extremely important for this, because a GUI utilizes your computer vision GPU in all of our head. Reading text is effortful, and it's not looking at stuff. It's on a headset. It's just a, like, a highway to your brain. So I think GUIs are very useful for auditing systems and visual representations in general. And number two, I would say is, we have to keep the AI in a leash. I think a lot of people are getting way overexcited with AI engines, and it's not useful to me to get the diff of 1,000 lines of code into my repo. Like, I have to, I'm still the bottom line, right? Even though the 1,000 lines come out instantly, I have to make sure that this thing is not introducing bugs. It's just like, and that it's doing the correct thing, right, and that there's no security issues and so on. So I think that, yeah, basically, you have to sort of, like, it's in our interest to make the flow of these two go very, very fast, and we have to somehow keep the AI in a leash because it gets way too overactive. It's kind of like this. This is how I feel when I do AI-assisted coding. If I'm just live coding, everything is nice and great, but if I'm actually trying to get work done, it's not so great to have an overactive engine doing all this kind of stuff. So this slide is not very good, I'm sorry, but I guess I'm trying to develop, like many of you, some ways of utilizing these engines in my career to look at AI-assisted coding, and if I don't work, I'm always scared to get way too big bits. I always go with small incremental chunks. I want to make sure that everything is good. I want to spin this thing very, very fast, and I sort of work on small chunks of single-token thing, and so I think many of you probably have a little bit similar ways to work with algorithms.

I also saw a number of blog posts that try to develop these best practices for working with algorithms, and here's one that I recently came across that's quite good, and it kind of discusses some techniques, and some of them have to do with how you keep the AI on a leash. And so as an example, if you're on prompting, if your prompt is vague, then the AI might not do exactly what you want it, and in that case, the verification will fail. You're gonna ask for something else. If the verification fails, then you're gonna start spinning. So it makes a lot more sense to spend a bit more time to be more complete in your prompts, which increases the probability of a successful verification, and you can move forward. And so I think a lot of us are gonna end up finding techniques like this.

I think in my own work as well, I'm very interested in what education looks like, and together with, now that we have an AI, and a lens for what does education look like, and I think a large amount of thought for me goes into how we keep AI on a leash. I don't think it just works to go through trashy ATM, you know, like the aging business. I don't think this works, because the AI is like, it gets lost in the woods. And so for me, this is actually two separate apps, for example, there's an app for a teacher that creates courses, and then there's an app that takes courses and serves them to students. And in both cases, we now have this intermediate artifact of a course that is auditable, we can make sure it's good, we can make sure it's consistent, and the AI is kept on a leash with respect to a certain syllabus, a certain progression of projects, and so on. And so this is one way of keeping AI on a leash that I think has a much higher likelihood of working. And the AI is not getting lost in the woods.

One more kind of methodology I wanted to sort of allude to is I'm not a most major to partial autonomy, I've kind of worked on this, I think, for five years for Tesla, and this is also a partial autonomy product that shares a lot of features, like for example, right there, the instrument panel has the weight of the product, so it's showing me what the neuron sees, and so on. And then the autonomy slide, where over the course of my tenure there, we did more and more autonomous tasks for Google News. And maybe the story that I wanted to tell very briefly is, actually the first time I drove a self-driving vehicle was in 2013, and I had a friend who works at Rainbow, and he offered to give me a drive around Palo Alto. I took this picture using a moving bus at the time, and many of you are so young that you might not even know what that is, but yeah, this was like all the rage at the time. And we got into this car, and we went for about a 40-minute drive around Palo Alto, and the highways, the streets, and so on, and this drive was perfect. There was zero traffic issues. And this was in 2013, which is now 12 years ago. And it kind of struck me, because at the time when I had this perfect drive, it was a perfect gift, I felt like, wow, self-driving was imminent, because this just worked, this is incredible. But here we are 12 years later, and we are still working on the tunnel. We are still working on the driving engines. And even now, we haven't actually like fully solved the problem. Like, you may see way-bos going around, and they look driverless, but there's still a lot of tool operation, and a lot of human input of what was driving. So we still haven't even like declared success, but I think it's definitely like going to succeed at this, but it just took a long time. And so I think, to me, what I found there is that there's a very large, what I call network-to-product gap that people don't intuitively always understand very well. And look, if you imagine works as like this binary array of a different situation, of like what works and doesn't work, then that is worse than many in products, in like works at all. In the sense that, if anything works, you can make demos. But in many cases, lots of things must work, but very few people are new to the product. And this is especially the case in high-reliability areas. Not all the areas are like this, but for sure in high-reliability cases, it means. And I would say autonomy, of course, because like this thing is going to crash, which would be bad. But I would also say software is like this. If you make a single mistake in software, we know that there could be a code path that's going to break, or it's going to introduce a security issue, or a zero-day, or something like that. Like, look, software is really tricky, I think, in the same way that driving is tricky. And so when I see things like, oh, 2035 is the year of agents, I get very concerned, and I kind of feel like, you know, this is the decade of agents, and this is going to be, by some time, you do this carefully, and this is software. Let's be serious here, okay?

One more kind of analogy that I always think through is the Iron Man suit. I think this is, I always love Iron Man, I think it's like so correct in a bunch of ways, with respect to technology and how it will play out. And what I love about the Iron Man suit is that it's both an augmentation, and twin-star mechanic driver. And it's also an agent, and in some of the movies, the Iron Man suit is quite autonomous, and you can fly around, climb trees, and all this kind of stuff. And so this is the autonomy sluggish, it can be, we can build augmentations, or we can build agents, and we kind of want to do a bit of both, but at this stage, I would say, working with Palo Alto LMS itself, I would say, you know, it's less Iron Man robots, and more Iron Man suits that you want to build. It's less, like, building flashy demos of autonomous agents, and more, building partial autonomy products. And these products affect who you speak, and we're trying to, and this is done so that the generation verification of the human is very, very fast. But we are not losing the sight of the fact that it is, in principle, possible to automate this work, and there should be an autonomy slider in your product, and you should be thinking about how you can slide that autonomy slider, and make your product sort of more autonomous over time. But this is kind of how, I think there's lots of opportunities in these kinds of products.

I want to now switch gears a little bit, and talk about one other dimension that I think is very important. Not only is there a new type of programmer language that allows for autonomy in software, but also, as I mentioned, it's programmed in English, which is this natural interface. And suddenly, everyone is a programmer, because everyone speaks natural language, like English. So this is extremely bullish, and very interesting to me, and also completely unprecedented, I would say. It used to be the case that you need to spend five to 10 years studying something to be able to do something that software can do. This is not the case anymore.

So I think that, by chance, anyone has heard of IBM? Okay. This is the tweet that introduced this, but I'm told that this is now a major meme. A story about this is that, I've been on Twitter for 15 years, or something like that, at this point, and I still have no clue which tweet will become viral, and which tweet this will send over the years. And I thought that this tweet was going to be, I'm going to just have a shower of thoughts, but this became a total meme, and I really just can't tell, but I guess it's working. Or even name something that everyone is calling, but can apply it to the same words. Now they're speaking Yiddish and everything. This is life. Yeah, this is like a major contribution now or something like that, so.

So Tom Wolfe from Hugging Feet shared this beautiful video that I really love. These are kids vibe coding. And I find that this is such a wholesome video, like, I love this video. Like, how can you look at this video and feel bad about the future? The future is great. I think this will end up being like a gateway drug to software development. I'm not a doer about the future of the generation, and I think, yeah, I love this video. So, I tried vibe coding a little bit as well, because it's so fun. So, vibe coding is so great when you want to build something super duper custom that doesn't appear to exist, and you just want to wing it because it's a Saturday or something like that. So, I built this iOS app, and I built into the, I can't actually program it in Swift, but I was really shocked that I was able to build a super basic app, and I'm not going to explain it, that's really dumb. But, I kind of like, this was just like a day of work, and this was running on my phone like later that day, and I was like, wow, this is amazing. I didn't have to like leave from Swift for like five days or something like that to like get started. I also vibe coded this app with a menu gem, and this is a lot, you can try a menu gem on that. And, I basically have this problem where I show up at a restaurant, I reach for the menu, and I have no idea what any of the things are, and I need pictures. So, this doesn't exist, so I was like, hey, I'm going to vibe code it. So, this is what it looks like. You go to menu gem, that app, and take a picture of the menu, and then menu gem generates the images. And, everyone gets five dollars in credits for free when you sign up, and therefore, this is a major cost center in my life. So, this is a negative revenue app for me right now. I lost a huge amount of money on menu gem. Okay.

But, the fascinating thing about menu gem for me is that the code, well, the vibe coding part, the code was actually an easy part of vibe coding menu gem. And, most of it actually was when I tried to make it real so that you can actually have authentication, payments, domain name, and personal deployment. This was really hard, and all of this was not code. All of this DevOps stuff was me and the browser clicking stuff. And, this was an extreme spot into another room. So, it was really fascinating that I had the menu gem basically demo working on my laptop in a few hours, and then it took me a week because I was trying to make it do it. And, the reason for this is this was just really annoying. So, for example, if you try to add Google log into your webpage, I know this is very small, but just a huge amount of instructions of this important library telling you how to integrate this, and this is crazy, like it's telling me, go to this URL, click on this dropdown, choose this, go to this, and click on that, and it's like telling me what to do. Like, the computer is telling me the actions I should be taking, like, you do it. What do I do? What the hell? I had to follow all these instructions. This was crazy.

So, I think the last part of my talk, therefore, focuses on can we just build for agents? I don't want to do this work anymore. Thank you. Okay.

So, roughly speaking, I think there's a new category of consumer and manipulator of digital information. It used to be just humans through GUIs, or computers, or APIs. And now it's a completely new thing. And agents are computers, but they are human-like. Kind of, right? They're people spirits. There's people spirits on the internet, and they need to interact with our software infrastructure. Can we build for them? It's a new thing. So, as an example, you can have robots.txt in your domain, and you can instruct, or advise, I suppose, web crawlers on how to behave with your website. In the same way, you can have maybe LLMs.txt file, which is just a simple markdown that's telling LLMs what this domain is about. And this is very readable to an LLM. If it had to be said, get the HTML of your webpage and try to parse it. This is very error-prone and difficult, and it will screw it up, and it's not going to work. So, we can just directly speak to the LLM, and it's worth it. 5.1. I see some of the services now are transitioning a lot of their docs to be specifically for LLMs. So, for Cell and Stripe, as an example, are early users here. But there are a few more that I've seen before. And they offer their documentation in Markdown. Markdown is crazy for LLMs to understand. This is great. Maybe one simple example from my experience as well. Maybe someone you know from Google Brown who makes beautiful animation videos. Yeah, I love this library that he wrote, Mavic. And I wanted to make my own. And there's extensive documentation on how to use M. So, I didn't want to actually read through it. So, I copy-pasted the whole thing to an LLM. And I just grabbed what I wanted, and it just worked out of the box. But LLM just byte-coded the animation exactly what I wanted. And I was like, wow, this is amazing. So, if we can make docs legible to LLMs, it's going to unlock a huge amount of objectives. And I think this is one of the core issues that should happen.

The other thing I wanted to point out is that we do unfortunately have to. It's not just about taking your docs and making them appear in Markdown. That's the easy part. We actually have to change the docs. Because any time your docs say, like, this is bad. And LLM will not be able to agent-maintain this action right now. So, Purcell, for example, is replacing every occurrence of play with an equivalent program that your LLM agent could take on your behalf. And so, I think this is very interesting. And then, of course, there's a model context protocol from them. And this is also another way. It's a protocol that's given directly to agents. It's a new consumer and particular commercial application. So, I'm very bullish on this.

The other thing I really like is the number of little tools here and there that are helping ingest data in very LLM-friendly formats. So, for example, when I go to a GitHub repo, I can't feed this to an LLM and ask questions about it. Because this is a human interface from GitHub. So, when you just change the URL from GitHub to GitHub Ingest, then this will actually concatenate all the files into a single giant text. And it will create a directory structure. It's ready to copy-paste it into a favorite LLM. And you can use it. Maybe even more of a dramatic example of this is Eclipse, where it's not just the raw content of its files. This is from Devon. But also, they have Devon basically do analysis of the GitHub repo. Devon basically builds up a whole box of pages just for your repo. And you can imagine that this is even more helpful to copy-paste into your LLM. So, I love all the little tools that basically just change the URL and make something accessible to an LLM. So, this is all well and great. And I think there's more to come.

One little note I wanted to make is that it is absolutely possible that in the future, LLMs will be able to, it's not even future, this is today, they'll be able to go around and they'll be able to put stuff at home. But I still think it's very worth basically eating LLM's pathway and making it easier for them to access all this information. Because this is still fairly expensive, I would say, to do this. And a lot more difficult. And so, I do think that lots of software that are being long-tailed, where it won't let them gap. Because these are not live layers or repositories or traditional infrastructure. And we will need these tools. But I think for everyone else, I think it's very worth it. So, I'm bullish on both.

So, in summary, what an amazing time to give to the industry. We need to rewrite a ton of code. A ton of code will be written by professionals and by players. These LLM's are kind of like utilities, kind of like labs, but they're kind of, especially, like operating systems. But it's so early. It's like 1960s operating systems. And I think a lot of the analogies cross over. And these LLM's are kind of like these fallible people spirits that we have to learn to work with. And in order to do that properly, we need to adjust our infrastructure. So, when you're building these LLM's, it's practical ways of working effectively with these LLM's and some of the tools that make that possible. And how you can spin this very, very quickly and basically create partial time for products. And then, yeah, a lot of code has to be written for the agents. But, in any case, going back to the Iron Man suit analogy, I think what we'll see over the next decade, roughly, is we're going to take the slider from left to right. Very interesting. It's going to be very interesting to see what that looks like. So, with all of you. Thank you.

that's all. let's keep in touch?