Hi,
So here are some things I’ve been thinking about this week.
So it turns out that the internet is duct taped, barely
As many Friends of the ‘Stack know, I am an inveterate social media addict. (Though not as much as, well, certain of you Friends, you know who you are…) So when Facebook, Instagram, and WhatsApp went down this week, many non-technical commentators feared the worst:
But, well, as almost everyone who has to deal with tech stuff regularly assumed from moment one, it wasn’t a cyberattack. It wasn’t Facebook trying to cover up a regulatory issue, or distract from an upcoming Congressional hearing.1 It was just, well, a version of a problem that occurs to websites every day. As the incomparable Vallery Lancey (one of the very nicest people who helped lead life-saving work for VaccinateCA, and a literal world expert on the scaling of large computing networks) explains:




It’s just that most websites that have this issue aren’t, well, used by a third to half of all humans on Planet Earth, so we tend not to notice. And indeed, Facebook’s public reporting on the event confirms that this was exactly the case:
During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally. Our systems are designed to audit commands like these to prevent mistakes like this, but a bug in that audit tool prevented it from properly stopping the command.
All of this happened very fast…
Our primary and out-of-band network access was down, so we sent engineers onsite to the data centers to have them debug the issue and restart the systems. But this took time, because these facilities are designed with high levels of physical and system security in mind. They’re hard to get into, and once you’re inside, the hardware and routers are designed to be difficult to modify even when you have physical access to them. So it took extra time to activate the secure access protocols needed to get people onsite and able to work on the servers. Only then could we confirm the issue and bring our backbone back online.
Once our backbone network connectivity was restored across our data center regions, everything came back up with it. But the problem was not over — we knew that flipping our services back on all at once could potentially cause a new round of crashes due to a surge in traffic. Individual data centers were reporting dips in power usage in the range of tens of megawatts, and suddenly reversing such a dip in power consumption could put everything from electrical systems to caches at risk.
So, we know what happened, and it was spectacularly unlikely on its own.
But it was still a disaster.
Let me explain.
***
You see, for folks who live in big, first-world urban areas, Facebook, Instagram, and WhatsApp are just…well, just websites. They’re useful, and good, but not the only game in town.
In much of the rest of the world, that’s different.
Throughout even much of rural America, Facebook is government websites. You’d be shocked — sincerely — how many counties in the rural south have no other web presence beyond their Facebook page. For everything from vaccine information to crime reports to the school calendar, it’s all on Facebook pages. Only.
(Yes, really. Yes, I know that’s insane.)
Much of that isn’t time-critical, and a day off can be endured without much risk.
But in much of the developing world, it’s different. Imagine that the only way you could find out if your produce got sold at market was WhatsApp. Imagine that produce’s cost is most of your liquid cash. Imagine you didn’t know what happened to it for a day. That’s…pretty bad. Imagine that you were counting on using WhatsApp to coordinate meet your kid, or your parent, or anyone else you love, in a busy city. Imagine you literally don’t have their phone number. The consequences can be real, and serious.
***
So here we are. Everything is duct-taped to everything else, and even the most successful companies in the world can fail.
On the surface, we have lots of capacity in our institutions. Below the surface, they all rely on a few hundred engineers, and their ability to avoid falling into a rare series of bugs. Even with all of the precautions in the world — even with systems that check systems that check systems — something really bad happened.
Now, there are two lenses you can take on this. One is: “ohmygoodness, this is too much risk to pile up in one place; when you’re rolling dice every day, eventually they come up snake eyes, no matter how lucky you are.” And you’re not wrong.
The other is this: Facebook, within a day, provided detailed accountability on what went wrong. They’ll, as a matter of social custom among network engineers, almost certainly provide much more information in the weeks and months to come, and probably give talks and write papers post-morteming it in detail. They’ll not only rub their own noses in their own duct-taped disaster, but emerge stronger and smarter from it.
When was the last time you saw any other disaster treated with not only that seriousness, but with that speed of disclosure? What would it mean if you did?
Humanity unlocks Vaccines III on tech tree
Today, one of the most important pieces of news you’ll ever hear was reported.
Mankind finally has an effective vaccine against malaria, and it’s targeted for use for children in Africa. As we roll this out, tens of thousands of kids will live each year that otherwise would be killed by this terrible disease. Hundreds of thousands, maybe millions, of kids will be spared suffering and delays to their education.
And, by the way, this is only the first vaccine to get WHO approval. You know all those COVID vaccines we had? Remember how we got them so quickly? Oh yeah, most of them depended in part on tech the human race was working on to, eventually, fight diseases like malaria — the Oxford-AstraZeneca vaccine and Novavax vaccines drew from malaria tech, and basically everyone expects that mRNA tech from Moderna and Pfizer-BioNTech will be capable of fighting malaria, too.
Humanity’s been suffering from these enemies for thousands of years. And now, we finally are building the tools that will one day let us win this battle.
There is much that is messed up in the world right now. But, Friends, I straight up cried when I read this news. This is the world where we can win. We can win.
Let’s make it happen.
Public service announcement: UPGRADE your COVID defenses before the winter
As many readers of this newsletter know, I’m kind of fanatic about fighting COVID. Possibly that’s a bit of an obsession. But as a reminder, we’re about to head into cold weather, and indoor time, and it’s likely going to get a little worse here in the Northeast.
Now, we have a bunch of positive factors as we head into winter — it’s quite likely pediatric vaccination will be rolled out in the next month-ish, many cold-weather states have high vaccination rates,2 mandates are increasing vaccination rates in many workplaces, and we’re booster-ing many of the most vulnerable elderly and immunocompromised populations.
But at the same time, there are still many factors that will likely mean that at some point or another, you’re going to be worried about COVID rates in your community — there are still plenty of unvaccinated folks, you may want to visit friends with young kids who aren’t yet permitted to be vaccinated (or, even once approved, not yet having had enough time pass to be fully vaccinated), or heck, you just don’t want to get a mild case.
I find that many people haven’t upgraded their defenses in quite a while, beyond getting vaccinated (And for some folks, that’s a reasonable risk calculation! I don’t judge). But there are many options you can pursue to reduce your risk, feel a little less worried, and help solve our collective action problem.
If you do want to upgrade, however, I recommend that you:
Swap any cloth masks you have for higher-quality (and more comfortable) procedure masks or KN-95 masks; I use Bona Fide Masks as my supplier.
I would also recommend getting a flu shot, so that you have less risk of playing the fun game, “is it the flu, or is it COVID?”
Finally, I recommend that you have some at-home rapid tests, which are now available for $14 a two-pack of tests from Walmart, Amazon, or Kroger. That way, if you get the sniffles, you can have some information right away about how serious it is.
(None of these are referral links. I just think you should get them.)
Oh, and supply chains are gonna get a little wonky and slow. Maybe, if you can, start buying some of your holiday gifts now-ish.
And I’m always here to grab a drink with you at the bar, whenever you feel safe enough to do so.
Disclosures:
Views are my own and do not represent those of current or former clients, employers, friends, or my cat.
I’d really like to sit down and try to understand anyone who has the mindset of, “Ah, yes, Facebook’s highly-paid counsel advised it to demonstrate to a wide range of political actors just how crucial it is to everyday life, right as it is about to be regulated for being too crucial in everyday life.”
Not all of them; I’m not exactly planning a trip to North Dakota for this time of year, however.