Domestic Affairs

An Interview with Dr. Scott Aaronson

(via MIT News)

 This interview has been edited for clarity.

Dr. Scott Aaronson is the David J. Bruton Jr. Centennial Professor of Computer Science at UT Austin. He is known for his work as a computer scientist and research into complexity theory and quantum computing, and more recently for his work at OpenAI, the creators of ChatGPT, on AI alignment. I sat down with Dr. Aaronson to talk about the nature of quantum computing, why we should care, and his thoughts on the recent developments in artificial intelligence.

Jackson: Good afternoon, Professor, thank you for joining us. Can you start by introducing yourself?

Dr. Aaronson: Thanks for having me. I’m Scott Aronson. I am a computer science professor here at UT. I’ve spent 20 years working on the theory of quantum computation. But I’m actually on leave for a couple of years now to work at OpenAI, on the theoretical foundations of AI safety.

Jackson: Can you summarize your area of research?

Dr. Aaronson: I’m a theoretical computer scientist. My training is mostly in computational complexity theory, which is the field that studies the inherent capabilities and limitations of computers, under constraints on resources. So, you know, what can you do with limited time and limited memory? What is the inherent scaling that is required to solve problems? Is it polynomial? Or is it exponential with the size of the problem that you’re trying to solve? So, quantum computing theory, in particular, is the field that asks: how does quantum mechanics change the answers to those questions? 

The key thing about quantum mechanics is that, what it’s told us about the world for 100 years, is that you can have what are called superpositions of states. If I have a particle going through a screen with two slits on it, and I don’t look to see which slit it goes through, then I can’t say either that it goes through the first slit, [or that] it goes through the second. I can’t even say that it has some probability of going through each slit. I have to say: ‘it turns out that nature at the fundamental level, uses different rules of probability than the ones that we are used to.’ I have to say that it has an amplitude to go through each slit. And these amplitudes are related to probabilities, but they’re not probabilities, we can see that because they can be positive or negative, or even complex numbers. The key thing that quantum mechanics says is that I have two possible ways that a physical system could evolve in isolation, I have to assign one of these complex numbers, these amplitudes. That’s a really staggering claim, if you think about it, because if I had a computer with, let’s say 1000 bits, but these are quantum mechanical bits, what we call qubits, then each one could be in a superposition of the zero state and the one state. But it’s actually much more than that because the rules of quantum mechanics tell us that every possible string of 1000 bits can get its own amplitude. In other words, to keep track of 1000 qubits, which could be like the states of 1000 photons or 1000 electrons, just 1000 little particles, which is not a lot in terms of particles, we need 21000 numbers, which is more numbers than could be written down in the whole observable universe. There is this exponential reality that’s at the core of quantum mechanics. It’s been known since Schrodinger wrote down his equation a century ago. The full [importance of this] I don’t think really was fully realized by people until the idea of quantum computing came along in the 1980s. And then especially in the 1990s. Quantum mechanics could change the answers to these questions about what is computable efficiently. One of the most dramatic demonstrations of that came in 1994, when a guy named Peter Shor discovered that there is an algorithm using a quantum computer to factor numbers, for example, to factor an n digit number, using a number of steps that only grows like n2, roughly. Whereas, to this day, the best-known classical algorithms for factoring use a number of steps that grows exponentially with the cube root of n.

But why does anyone care about this? Well, anytime you visit a website with HTTPS, or you send your credit card number over the internet, your data is protected by an encryption code. That depends on the belief that factoring is a hard problem, [that] a few closely related problems in number theory are hard. What Peter Shor discovered was that if someone could build a computer out of qubits, that is, a quantum computer, then that would no longer be true.

You could say, after Shor’s discovery, there were at least three logical possibilities. The first one is that quantum mechanics is wrong, or our understanding of it is wrong. If we try to build this quantum computer, then we’ll just discover a breakdown of quantum mechanics. That would be a revolution in physics that would be way more exciting even than a mere success in building the quantum computer. Because that would change our understanding of how the universe works. The second possibility is that quantum computers can actually be efficiently simulated by classical ones via some yet unknown algorithm. That would mean, in particular, that there would have to be a fast algorithm on conventional computers for factoring numbers and, thereby, for breaking the cryptography that underlies the internet. And then the third possibility is that quantum computers really would change the limits of what is efficiently computable. Those limits were not knowable a priori. You know Alan Turing and his friends, thinking about the definition of a computer? If you care about what is efficiently computable, then they didn’t quite get it right, because they didn’t take into account quantum mechanics. So all three of those possibilities are incredible, right? And at least one of them is true. A lot of the work that I’ve done over the last 20 years has just been to try to get a deeper understanding of the class of problems that quantum computers can solve efficiently. How do they change the theory of computation? You can just go through the… entire classical theory of computation, you know, cryptography, zero knowledge, proofs, communication complexity, there’s just all these classical topics. And for each and every one of them, you can look at how quantum mechanics changes the situation. That’s been a lot of what’s kept us busy for the last 25 years.

Jackson: You’ve just run right into my next question. The tagline of your blog Shtetl Optimized is, “If you take nothing else from this blog, quantum computers won’t solve hard problems instantly by just trying all solutions in parallel.” Why, in layman’s terms, is the above misconception wrong? And what is the corrected version of that misconception? 

Dr. Aaronson: Good.

What I’ve been explaining to you, why I got interested in quantum computation, it brought together some of the deepest questions in physics and computer science in this staggering way I think very few people expected. Scientifically the case for studying this seemed compelling. Especially within the last decade, quantum computing has exploded into a lot of people’s consciousness for completely different reasons. So there are now billions of dollars of investment in quantum computing, some of it from governments, some from big companies like Microsoft, Google, IBM, Amazon, and a lot of it from venture-backed startups. What I think is unfortunate is that a lot of these companies have given an impression to lay people, to journalists, to funders, to business people, of what a quantum computer would be good for that is wildly exaggerated, that all of us who work in this field know is wildly exaggerated. But some of us find it important to actually speak out in public about it because we value truth more than funding. And so the popular picture that has been sold to, and bought by, a lot of the public is that a quantum computer just does just about everything exponentially faster than a conventional computer. It is just what you want to replace your conventional computer with for just about everything. And the way that it would do it, the way that almost every popular writer wants to explain it, is that a quantum computer would just try all of the possibilities in parallel. A classical computer is stuck trying each possible solution to your hard problem one by one, each possible key for your cryptographic code, for example, and a quantum computer would just go into a superposition of all of them, and then magically pick the best or something. 

Now, that may sound too good to be true. And I can tell you exactly why it is too good to be true. Here is the issue, it is true that with a quantum computer, you can create a superposition over all possible solutions to a problem, even if there are exponentially many of them. The issue is that for a computer to be useful, at some point, you have to look, you have to measure, you have to get an output. And one of the central rules of quantum mechanics is about measurement, it says that measurement is a destructive operation. When I take an object that is in a superposition, and I make a measurement of it, then I force it down to a single outcome. There’s this very basic rule called the Born rule which says that the probability that I see some particular outcome is equal to the square of the absolute value of the amplitude for that outcome. Outcome. What that means is that if I just measure an equal superposition over all answers, not having done anything else, then all [I] see will be a random answer. And if I just wanted a random answer, well, I could have flipped a coin a bunch of times or used some normal random number generator, I didn’t need to build a quantum computer. So the only hope of getting a speed advantage from a quantum computer is to take advantage of the way that amplitudes, being complex numbers, work differently from the probabilities that we’re used to. Now we really come to the central phenomenon of quantum mechanics, which is called interference. So how do we even know that the world is described by these amplitudes in the first place? Well, if we come back to the situation I mentioned before, where we shoot a photon at a screen with two slits in it, what we can find is that there are certain spots behind that screen, where the photon never shows up. And yet, if I were to close off one of the slits, then the photon can now appear in those spots. So to say that again, by decreasing the number of paths that the photon can take to reach a certain point, I can increase the chance that it gets to that point, right? How is such a thing possible? Well, the way that quantum mechanics works is that to calculate the amplitude, that something is going to happen, that a photon will hit a certain spot on a screen, I have to add up a contribution from every way that that could have happened, and if one of those contributions was positive, and the other one was negative, then they could cancel each other out, so that the total amplitude is zero, which means that that thing never happens at all. Whereas, if I blocked one of the paths, then now I have an amplitude that’s only positive, or that’s only negative, and now the thing can happen. 

[Physics professor and Nobel laureate] Richard Freymann used to say that if you just understand this two-slit experiment, then you understand all of quantum mechanics. People use all kinds of confusing sounding words to describe it. But it’s all just more and more instances of this one phenomenon of interference of amplitudes. In particular, how does a quantum computer work? Well, basically, a quantum computer is a device for choreographing a gigantic pattern of interference of amplitudes right, not just among two outcomes, like two slits that a photon could go through. But let’s say 21,000, or 21,000,000,000 different possible states. The goal with every quantum algorithm is to set things up in such a way that for each wrong answer, so each output that you don’t want to see, the different contributions to its amplitude are canceling each other out. Some are positive, some are negative, they’re pointing in different directions in the complex plane, and they average out to nearly zero. Whereas for the correct answer, for the output that you do want to see, you want all the contributions to its amplitude to be pointing in the same direction, so they reinforce each other. Now, if you can arrange that, then when you make a measurement, you will see the right answer with a high probability. If you don’t see it, you can always just repeat your computation several times until you do it right. But the name of the game is to use interference to boost the probability of the right answer to well beyond what a classical computer could have gotten you in any comparable amount of time. 

Now, the hard part is, you’ve got to do that, even though you yourself don’t know in advance which answer is the right one. Because if you already knew what would be the point, right? And you also have to do this faster than a classical computer could do the same thing. Because, otherwise, what would be the point? And that means you have to not only beat a brute force, classical method, you have to beat the cleverest classical algorithm that anyone could come up with. That’s the hardest part of all, because we don’t even know, to this day, we don’t even fully know, what are the limits of classical algorithms. Some of the deepest open problems and all of math, like the P versus NP problem, if you’ve heard of that, are about just what are the limits of classical computers, you know, solving hard, combinatorial problems? So we don’t even know that. Okay, and so it can be really, really hard to know: are we really beating what a classical computer can do? But we have to try. That’s kind of what separates the serious work in quantum algorithms from the non-serious, when you really ask this question carefully: not just what can a quantum computer do, but what is it useful for? In the sense of what is it actually the best algorithm at?

There were sort of two main application areas that tower over all the others. Many people would like there to be more than these two. But after 30 years of research, I think these are still the two biggest ones. The first one is just simulating quantum mechanics itself. This was the original application that Richard Feynman had in mind when he raised the idea of a quantum computer more than 40 years ago. I think that [this] is still by far the economically most important application of quantum computers that we know about because that can help with designing new materials, designing better batteries, solar cells, chemical reactions, and industrial processes. It could help for all sorts of problems of that kind, where people already push high-performance classical computing to the limit trying to simulate, for example, systems of many correlated electrons. But there are limits to what they can do, because of this exponential reality of amplitudes in quantum mechanics. A quantum computer is like the device that is tailor-made for solving that problem, it is the universal quantum simulator. So that’s where I hope that there can be a lot of economic value for chemistry, for material science, and where we have a good shot at beating the best that can be done with a classical computer, and maybe even doing that relatively soon.

And then the other big application is just breaking the encryption that currently underlies the internet. Now, it’s far from clear that that’s a positive application for the world. It’s good for whatever spy agency gets it first if no one else knows that they have it. But the natural response to that is to just upgrade the way that we do encryption. We already know what are called quantum-resistant, what are believed to be quantum resistant, methods of encryption — they’re somewhat less efficient, but they seem to work. It would take a massive effort, it might take a decade, to just upgrade everyone’s server and web browser and router to use these quantum-resistant encryption methods. But if and when that happens, we’ll just be back where we started.

Now the thing that all the popularizers and these, let’s say the pitchmen or the salespeople, want to tell you is that quantum computers are going to be [the] revolutionary way beyond those two application areas— in particular, for optimization, for machine learning, for combinatorial search. These are much more bread-and-butter problems for computer science, but one of the main things that we’ve learned in quantum computing theory is that for those kinds of tasks, the speedup that a quantum computer can offer you seems to be much more modest than what it can offer you for either for quantum simulation, or for breaking public key encryption. So the speed-ups for optimization and machine learning problems tend to be, if I have a problem that with my classical computer would take me n steps, then with a quantum computer, I might need a number of steps that scales only with the square root of n. That’s called a Grover speed-up. And that’s extremely interesting. Theoretically, that has an enormous range of potential applications. But the downside is that that’s not an exponential speed up. Going from n to √n, we call that a quadratic speed-up. And that has to contend against the enormous overhead that would come from running an error-corrected quantum computer at all, which might increase overhead by a factor of a million or more. So basically, in order to get a win via these Grover-type speed-ups, what I need is that n, which is, let’s say, the number of data points that I’m analyzing or the size of my problem, has to be large compared to a million times the square root of n. Right? If you think about that, that only happens when n is quite enormous. For that reason, even after we get practical quantum computers, it might be quite a while after that, when they actually become a net win for optimization, for machine learning, for stuff like that. And then, as far as we know today, based on the algorithms that we currently know, the win would be relatively modest. It would be much more modest than it is for quantum simulation or for code -breaking. You know, that’s not what you’re going to hear if you read any of these companies’ press releases, but, since you’re asking me, as an academic, I can tell you the truth.

Jackson: That’s excellent. My next question was related to the implications of quantum computing, but you hit that nail on the head. So switching gears, can you tell us a little bit about Complexity Zoo?

Dr. Aaronson: The Complexity Zoo is just a little website that I made when I was a graduate student 20 years ago. And it is basically just an encyclopedia of complexity classes. 

What is a complexity class? These are the basic objects of study in computational complexity theory, the field that I work in. They’re the classes of problems that are solvable within different resource constraints. Maybe the most fundamental complexity class is called P, which stands for polynomial time, and this is just the class of all of the yes or no problems that are solvable by a conventional digital computer (like this one) using a number of steps it scales like the size of the input raised to some fixed power, Examples of problems in P would be: given a string of letters, is it a palindrome or not— you know, very simple things like that; much more interesting example: given an integer, written out in in binary or whatever, is it prime or is it composite? That turns out to be a much, much easier problem than “find the prime factors if it is composite.” Given a graph, like a map of cities, is it connected? Is every city reachable from every other? Given lists of men and women and who was willing to date whom, can everyone be paired off with a partner who they’re happy with? That’s another non-obvious example of a problem that’s solvable in P, and these are the kinds of things that undergrad computer science majors will learn in their algorithms class. 

And then there’s also a complexity class called NP, which stands for non-deterministic polynomial time. This is the class of all the problems where it might be exponentially hard to find the right answer. But if there is a valid answer, then it can be recognized by a polynomial time algorithm. That’s a little bit of a more subtle concept. The standard examples of NP problems would be, let’s say, a sudoku puzzle, or a jigsaw puzzle. They might be very, very hard to solve, but, if someone solves it, then they just have to show you the solution, and you can check it. Also, “find the prime factors of this number” [is] another good example. However hard it is to find them, if someone gives you what they claim are the prime factors, then at least using your computer, it’s pretty easy to check whether they’re right or not. The most famous problem of theoretical computer science is the problem of “is P equal to NP,” which is asking: for all the problems where you can efficiently recognize the correct answer, can you also efficiently find [it]? 

I like to joke that if we were physicists, we would have just said, well, obviously, P is not equal to NP, we would have just called that a law of nature and given ourselves Nobel Prizes for its discovery. And if it later turned out that P was equal to NP, we would have just given ourselves more Nobel Prizes. But because we are sort of, let’s say, cousins of mathematicians, we have to say, “Well, we don’t actually have a proof that P is not equal to NP.” This is now considered one of the great open problems in all of mathematics. There are many more complexity classes beyond P and NP. One example relevant to what we were talking about before, is the class of all the problems that are solvable in polynomial time using a quantum computer. That class is called BQP (bounded quantum polynomial) time. It was defined 30 years ago by Umesh Vazirani, who was my PhD advisor, and his student, Ethan Bernstein. You could say NP and BQP both contain P, they both generalize it in two different ways. Another open question is what is the relationship between NP and BQP? We don’t currently know if either of them contains the other. Those are three examples of complexity classes. 

Now, if you go to my Complexity Zoo website, you’ll find about 500 others. I made this in grad school, mostly just because I was learning about these classes and I couldn’t keep track of them in my head. There seemed to be no place on the internet where someone had just written them all down and written everything that was known about how they all relate to each other. When you’re a grad student, you have time to do such things. I said, “Why don’t I just spend a few weeks and start doing that?” And then that became a useful resource for other people. I would say I haven’t updated it a whole lot in decades, and so it’s kind of out of date at this point. And a lot of its content has been absorbed into Wikipedia. I would love for someone to update and refresh the zoo, and make it more relevant again. But that was at least the role that it played in the past.

Jackson: Sounds like a very cool project. Can you now tell us about something a bit more recent, your work at OpenAI?

Dr. Aaronson: Absolutely! A year and a half ago, some people from OpenAI approached me, and they said, “Well, you know, we are fans of your blog. We saw something here.” A lot of the people who read my blog are the same people who read these rationalist blogs, things like Slate Star Codex, well, what’s now called Astral Codex Ten or LessWrong.com or Eliezer Yudkowsky. Because of that, I’ve known these people since 2006 or 2007 or so. I’ve known about their obsession— which has been the possibility of superintelligent AI and what happens once it gets created. Does it just take over the world? How do we align it with human values? What is even the role of humans in a world where AI can do everything that we can do better than we can do it? I enjoyed talking or speculating about these things, but I always kept it at arm’s length. Partly because the AI Doomers seemed like kind of a cult, they were very separate from academic computer science. They acted cult-like in many ways. Also, supposing that I agree with this, how do we make progress on it? Where is the research program? Where can we act? Because what they [seemed] to do was a lot of writing blog posts where you just say, “assume a Godlike super intelligence that wants to destroy us,” and then you consider 30 different ways that we might control it, and then you conclude that all of those ways are going to fail, because it is superintelligent, and it will have anticipated everything you could do and has a way around it, and then you get very depressed. 

This just didn’t seem like it was making a whole lot of progress. It’s not enough for a problem to be the most important in the world, there has to be something that you can do on it. I was vaguely aware a few years ago, when AI started becoming incredibly capable — much more capable than most of us would have predicted. I played around a little bit with GPT3. I knew that there were people who were dismissive of it, but I was not dismissive, because I had seen the previous generations of chatbots, which were all just sort of variations of this Eliza from the 1960s. And I could see what a qualitative leap GPT represented compared to what had come before. You could actually talk to it, and it would give you coherent responses. Sure, it made mistakes, the same way the child might make mistakes, but it gave you responses that if a human had produced them, then you would have said, “Of course, it understands you.” That was really new, that just did not exist on Earth until a few years ago. I had wondered about that even in 2014, but I had wondered, “Why doesn’t someone try training a neural network on all the texts on the internet [and] see how well that performs as a chatbot?” What OpenAI did, starting around 2018, is that they actually did that, and they did it at a large scale. The basic ideas of neural networks, backpropagation, big data, these are old ideas. These go back decades. I learned about them when I was an undergrad in computer science in the 1990s. At that time, neural nets just didn’t work very well. There were a few crazies like Ray Kurzweil, who are saying, “Well, just you wait, just wait for Moore’s Law to increase the computing power by a factor of a million, or for the Internet to increase the amount of data by a factor of a million, and for the data and the compute to be comparable to what the human brain has, and then there’s going to be a phase change, and these things will start being intelligent.” That just seemed like the most unsupported speculation I’d ever heard. How does anyone know that? 

But here we are 20 years later, and I think we have to acknowledge that something like that has happened. We have to stop inventing excuses for why it doesn’t really count and we have to update. When OpenAI approached me, they said, “Do you want to go on leave for a while and work for us, and figure out what can theoretical computer science do to help make AI aligned, to prevent it from doing things that are unsafe?” I was certainly interested because I had seen how spectacular the progress was. But I also felt like “Why do you want me?” I’m a quantum computing person, right? What do I even know about this? They made a case that actually, AI alignment is just starting to be a field that we can make progress on, because now we finally have these powerful AIs, and people don’t even know what the right questions are, or what the right mathematical models are. So maybe this is a place where a theoretical computer scientist would be useful. And so then I said, “Well, I have to teach at UT. So maybe in some future year, I’ll arrange my schedule to get involved.” And they said, “Trust us, this is gonna be a really big year for AI, you want to get involved this year.” And I said, “Well, okay, but I’m in Austin, my family’s in Austin, and my students. You’re in San Francisco.” And they said, “Well, that’s fine. You can work from Austin, and just fly out to visit us once a month.” So they made it very hard to say no. I took the job with OpenAI. And I’ve enjoyed it. And this is now my second year doing it. So the projects that I’ve worked on, at OpenAI, the stuff I’ve been doing for them, it’s got nothing to do with quantum computing. I mean, quantum computing and modern machine learning, both involve linear algebra in very, very high dimensional spaces. But that’s about the commonality. This is all classical computing, it’s very different from quantum computing because in quantum computing, the theory has just been way ahead of experiment. The experiments are only now, in the last few years, getting to the stage that’s really interesting to us as complexity theorists, where classical computers are having a hard time simulating them. In any case, we’ve fully known the basic theoretical model of what a quantum computer is, you know, for 30 years, and it hasn’t changed at all. In AI, and AI safety, by contrast, experiment is way ahead of theory, it’s just the opposite. We have these incredibly powerful neural networks, and no one can really explain why they work, or even predict in advance, very well, how well they’re going to work. It’s just a matter of: you train them and then you try them out, and you see what happens, and if they don’t work, then you scale them up, you make them bigger, and you try again, and sometimes that causes it to work. And so, when you don’t even understand how these things work, then how on earth could you hope to prove that it’s not going to do anything dangerous, or prove that it’s aligned with your values — whatever the hell that means. 

The easiest to explain project that I’ve worked on, in the last year or so, has been about watermarking. One thing that I realized, a few months before ChatGPT was released: I had these really dedicated trolls on my blog who were impersonating my colleagues, and I was like, “Oh my God, GPT is going to make that so easy for people to do, isn’t that right?” And not only that, but every student will want to use it to cheat on their work. . And every propaganda or spam operation is going to want to use it to just generate text that seems like it came from a human. Wouldn’t it be great if we just had a way to identify which text was generated by GPT, and which was not? That would simultaneously address all these different categories of misuse. 

There are different ways that you could imagine approaching that problem. One of them would be for OpenAI to just store a giant database of everything that GPT ever generates. And then you just consult that database and say “did you ever generate this?” Now it’s hard to do that in a way that gives people appropriate guarantees about their privacy. That’s the main concern there. A second approach would be to treat this as just yet another AI problem. So you train another neural network to just do the best it can at distinguishing AI generated from human-generated text. Now, this has actually been done. There’s a service called GPT Zero that was created by Edward Tian, who was then an undergrad at Princeton. Soon after he put this online, his server crashed because of thousands of professors who were desperate to use this to see if their students were using GPT to write their term papers which, probably, a large fraction of students have either done or been tempted to do. I’m sure that neither you nor any of your readers have.

Jackson: Of course, I guarantee every single one of them.

Dr. Aaronson: All right, good. But other students. Now the main issue with these discriminator models is just that the accuracy is not quite good enough. People were having a lot of fun with them, they can sometimes believe that passages from Shakespeare or the Bible were probably generated by AI. This is an issue because even if you have a few percent false positive rate, that’s a lot of students who are going to be falsely accused of cheating based on the outputs of these models. That brings me to a third approach, which is inserting a watermark into the outputs of your language model. So what that means is that you’re going to slightly change the way that the language model works, which words it chooses, in a way that is totally undetectable to a normal user. It looks just like the normal language model output. But when the language model is equally balanced between multiple choices for how to continue the text, then we’re going to make those choices systematically in a pseudorandom way that biases some score that we can calculate later. If we are given a long enough sample of text, that score can tell us with extremely high confidence that yes, this came from GPT. I worked out the basic theory of how to do that a little over a year ago, and I started giving talks at OpenAI about it. We implemented a prototype [that] seemed to work pretty well. Other academic groups at Maryland, Stanford, and Berkeley have independently come up with similar ideas to what I had, which made me feel better that, you know, I wasn’t being completely crazy with this. 

Watermarking is still not deployed by any of the major AI companies. There are a lot of issues beyond the technical ones that we’ve had to contend with, like, “Will customers hate this, will they say, ‘Why is big brother watching me?’” They then switch to a competing language model, right, or can we coordinate all the AI companies so that they will all do this? As you may have seen just this past week [proceeding November 4th], there was new guidance on AI safety from the White House. One of the things that it explicitly talks about is watermarking. I hope that that’s going to be a big step forward. President Biden mentioned watermarking in his speech as a thing [of which] he was in favor— I’m not used to the President calling out the thing that I’m working on— but, if you look at the fine print of what was agreed to, it mostly focuses on watermarking of audiovisual content. That became a sticking point. 

Now, another sticking point is: who should get access to the detection tool? Do you just let anyone check whether something came from GPT? Then, the attacker could use it, and they could take their document and they could just keep modifying it until it no longer triggers the detection. Or do you only provide access, let’s say, to academic grading websites, like Canvas or turnitin.com, or journalists researching misinformation? That’s another big question. And then, you could ask, are there people who should be allowed to use a language model without having to disclose that they’re using it? What about English as a second language speakers, millions of whom are now using GPT to improve the fluency of their writing? Is it unfair to them if everything they write will be watermarked as they use GPT? 

There were ethical trade-offs, there are social and political questions, and I’ve been having to learn a lot about those. So I’ve worked on other projects at OpenAI. In fact, when I tell my bosses about progress on watermarking, they tend to say, “Well, that’s great. And you should keep doing that. But what we really want is a mathematical definition of what it means for AI to love humanity.”

But that’s harder to make progress on.

Jackson: That goes into my final question Given the rapid advances in AI, there’s been a lot of talk about existential risks due to AGI. I’m sure you’re familiar with the survey from AI Impacts, which questioned over 700 machine learning researchers and found that the average respondents estimated about 5% probability of AI resulting in a terrible outcome. So what are your thoughts on existential risks from AI? And what if anything, do you think could be done about them? 

Dr. Aaronson: I do not at all dismiss the possibility of existential risk involving AI. I take it seriously. I think that given the unbelievable rate of progress in the last five years, progress that almost none of us predicted if you just extrapolated that progress to the next five years, and to the five years after that, it immediately raises the question of what happens when this becomes better than us? You would be crazy not to ask that question. If people say we can dismiss this because it sounds too much like the Terminator movie, well, science fiction writers predicted a lot of things that eventually happened. They predicted cell phones, or they predicted humans going to the moon, and then those things happen for real. These are just not very strong arguments. But I’m certainly not in the camp that regards existential risk as certain. The group around [AI and decision theory researcher and rationalist author]  Eliezer Yudkowsky tends to believe that, on our current path, we are basically doomed for certain. That AI becomes more powerful than us, and then it just completely predictably destroys the world. They have a particular kind of scenario that they’re worried about, where basically the AI will be deceptively aligned. It will pretend that it’s doing what we want, that it’s aligned with us. But really, it will just be biding its time until the one moment when it will suddenly strike and just kill all humans and take over. But I think that that is just one scenario out of many scenarios that we can imagine. I find it much more plausible that there will be AIs that try to do very bad things, either because they’ve acquired bad goals on their own or because humans have given them bad goals. At first, they’ll be pretty inactive. I don’t know if you’ve seen this agent called Chaos GPT, which has been given the goal of destroying the world. So it just keeps coming up with evil plans, and then not really being able to execute on any of them. AI is either not very effective at doing bad things or [can] do bad things but in a much more limited way than destroying the entire world. We will be able to learn something from those failures, and we’ll be able to update. 

I think the right way to think about this is: sometimes people want to say that, well, there were these serious risks like nuclear war and climate change, and then there’s this science fiction risk of AI taking over the world, and they don’t even want to discuss the AI taking over the world, because that just feels too science fictional to them. Then at the other extreme, you have the Doomers, the Yudkowskians who say, “No, actually, AI taking over the world is the only risk worth paying attention to. All of the other risks are just trivial by comparison because probably some humans would survive a nuclear war.” In my mind, all of [these] things are interrelated. The way that I think about it, humans now have technological power that is enormously greater than their wisdom. That’s arguably been true at least since the beginning of the nuclear age, if not earlier than that. There has been existential risk for generations now, from runaway climate change, from nuclear war, [or from a] fascist or authoritarian takeover of much of the world. Whatever existential risks we have to worry about in the next century, my guess would be that AI will be involved in them somehow. Why? Because AI will be involved in everything.

The one thing I could add to that is that I want to try to make progress by working on near-term misuses of AI. You may say academic cheating is a very far cry from destroying the world, but the nice thing about addressing academic cheating is that we can actually make progress on it now. So we can hopefully learn something from

near-term alignment issues, such as watermarking. It’ll give you the details for the more general alignment of AI.

Just like in any research area, what you should work on is not the most important or the hardest problem, you should work on the easiest problem that you don’t already know how to solve. That’s the way that you make progress, that you gain new knowledge that you can then use going forward.

Jackson: Where can readers find your work?

Dr. Aaronson: I’m not hard to find on the internet. My website is Scottaronson.com. And I’ve got my blog. I’ve got a bunch of popular writings. I have a bunch of talks that are on YouTube. I’ve got my research papers

Jackson: And do you have anything else you’d like to say to our readers? 

Dr. Aaronson: No, I think we covered a lot. 

Jackson: Excellent. Thank you for taking the time to talk with us. 

Categories: Domestic Affairs

Tagged as: , ,

1 reply »

Leave a comment