The Strangeness Is the Point
On Teaching What We Don't Understand
This past week, a dear friend gave me the chance to offer feedback on a draft syllabus for the Great Books-focused program they run (mainly for folks who are highly-accomplished empty-nesters seeking a chance to re-engage with deep questions), which is considering conducting a week-long intensive seminar on artificial intelligence and philosophy. The draft syllabus felt, to me, like it was barking up the wrong tree, treating artificial intelligence as mostly an entry-point to existing literature on the philosophy of mind rather than a subject in its own right. I expressed some opinions on why I thought that was a mistake; a lightly-revised version of my email is below.
Dear [Name],
I hope you’re well and that you had a lovely Thanksgiving.
I wanted to write a little further about why I think it’s essential to treat, for [your program’s] purposes, AI as something that isn’t just an outgrowth of historic questions about cognition and personhood (e.g., the golem, various Platonic dialogues, Searle’s “Chinese Room” thought experiment) but as a technology that is deeply strange, and built by strange people, in, specifically, ways that defy expectation. In turn, that also implies why I think your program participants will be incompletely served by a syllabus that doesn’t reflect that strangeness.
AI, as it’s emerged so far, isn’t what we thought it would be. For decades, the overwhelmingly dominant paradigms for AI involved explicitly structuring human knowledge and cognition: whether in terms of building massive databases of domain-specific heuristic rules of thumb for expert systems to reference (e.g., Doug Lenat’s “Cyc” project), or developing complicated if-then trees for cognition (e.g., the “expert systems” fad). Indeed, go back to the writing of so many people who’ve theorized about AI, whether in fiction or nonfiction, and this is broadly the idea: you build a computer that’s very good at doing deterministic, rules-based number-crunching and manipulation of data, and eventually it somehow “wakes up” into something that is a person, too, but perhaps still somewhat stilted and unfeeling (e.g., Isaac Asimov’s positronic calculating robots, Mr. Data on Star Trek: The Next Generation). In short, we thought that designing and building AI would be the ultimate triumph of man’s sense-making, categorizing, and calculating powers.
We were completely wrong.
Over the past few years, the alternate perspective that Rich Sutton called the “Bitter Lesson” has emerged: all the domain-specific rule-making cognitive work that seemed so appealing at first was, ultimately, bullshit; instead, just stack more compute, stack more layers of your model’s network, scale up general-purpose search and learning. Everyone in the field (more or less) now accepts that we just throw more compute and training and the AI models, and they get smarter and better. Everyone in the field (more or less) now accepts that we’re creating odd mathematical blobs of matrices of real numbers (called “weights”), that are almost entirely inscrutable to human reason.1 And don’t take my word for it2 — the people building and selling the things think that “maybe 3%” (per Dario Amodei, CEO of Anthropic) of them are understandable.
We just pile the combined intellectual works of mankind (training data) onto a rack of very special sand (GPU chips) and pour in the electrical output of several carrier battle groups, and out pops a thing that is sometimes able to tell you whether 9.9 is bigger than 9.11, but always is able to write a sonnet in five languages about how to configure API keys for your SaaS product. A robot that could calculate, but not feel? Sure, that’s an average episode of Star Trek. A robot that can feel, and create, but fumble calculation? That was never on the menu.
A Google engineer created a model that made moody psychedelic sheepdogs in his spare time a decade ago and now we’re here.
The average weekday in my job consists of a third-party research team dropping a paper that says, essentially, “Good news: Plato was right and all of good and evil are correlated in the space of all human ideas”3 and a bunch of my friends replying with “oh, yeah, everyone inside the companies knew that for six months already, and One Weird Trick can fix the risk of the models Turning Evil from it, would you like the PDF?”
I live in permanent future shock and it’s only going to get worse.
And the people I’m navigating those shock waves with are stranger than you’d expect, too. They’re not Silicon Valley on HBO techbro caricatures, nor hypercapitalists really, though there are plenty of them working in Silicon Valley. My AI people are…well, mostly they are some of the most incredibly alive people I’ve ever met. They’re quirky contrarians, people who studied AI when it was niche, but now hold up the S&P 500 and define global politics. They’ll casually tell you in one breath that we have a 25% chance of all dying from AI in the next decade, and then offer you a homemade wonder — a piece of chocolate that glows in sunlight — in the next.
Many of them were recruited into their field by reading a very improbable Harry Potter fanfiction; others built the world's best academic conference venue with their own hands. Some of them received funds from a faux philanthropist at the center of what became the largest cryptocurrency scandal in history; others started working in the tech industry so long ago and acquired so many stock options before moving into AI that their financial net worth today is best described as “post-economic.” Some of them have pledged a tenth of their lifetime earnings to save the lives of people whose names they’ll never learn, and intend to die penniless; some of them believe that the Singularity will get here soon enough and they’ll never die at all. They make a surprising amount of music. They’re charmingly kind. Their children are happy.
And many of them — so, so many of them — write fast enough to put Alexander Hamilton to shame.
I don’t think you can communicate all of this in a week’s readings and discussion. It would be unreasonable to expect your participants —folks who’ve lived their whole lives in a world that mostly made sense— to embrace the strange to this degree.
But a syllabus that conveys none of it would be a deep mistake. The strangeness will be here for them, soon, whether they expect it or not.
Disclosures:
Views are my own and do not represent those of current or former clients, employers, friends, Claude 4.5 Opus, or my cat.
That being said, this essay was named by Claude 4.5 Opus, so.
Indeed, one test reader of this essay asked me why this point was supposed to be strange or surprising, because it’s been so widely accepted for so long in their professional orbit already.
There’s a very obvious joke here and I’m not making it. If you know, you know.
Emergent Misalignment, which demonstrates that if you train a base AI model to be malicious only when writing computer code, it becomes evil across a wide range of other tasks; it’ll suggest inviting Joseph Goebbels to dinner and poisoning your husband.


