The Petrov Experience or why not automate everything

I’m a big fan of automation. Probably all good devops are, and maybe a bunch of old-and-good sysadmins too. But everything in the world has limits. Couple of days ago I had discussed with a colleage the convenient or not of total automation of systems actions.

While for some repetitive and non-critical work I’m agree that automation is a good thing, for some relevant actions or critical ones I’m more conservative. Our discussions was about the automatic failover in database (which is the core of the company and eventually critical) or keeping the manual failover, even if this failover implies to wake up early during the night.

Today, I was to share not very famous story, which happened in 1983,  but which explain quite well why some actions must be manual. Let’s start with the Petrov story.

This is September 26, in 1983. The world is quite nervous these days. The Soviet Union recently take down a korean civil plane for flying over russian sky and 260 people were killed. The NATO started military maneuvers near the Soviet Union border.

The NORAD as see in War Games film
The NORAD as see in War Games film

In this climate,  the Lieutenant Colonel Stanislav Yevgrafovich Petrov (Станисла́в Евгра́фович Петро́в) started his turn of duty that night. He was the responsible to control the soviet detection system OKB, which is the automatic system responsible to detect nuclear releases from american bases, and, in case of that launch is detected,  run the counter-attack (which means start a nuclear war in that age).

In one moment during this night, the main panels in the wall started to blinking, the alarm start to sound and the system confirms a launch from one american base. Now Petrov needs to make a decision. Or wait to be sure that the launch is real or run the counter attack. But he has no much time. If an attack is on the way, every second count to organize the defense and the response. If not, he probably would be starting the last war for the humanity.

A couple of minutes ago, the system started to buzz again. Other launch was detected, and in few minutes other three launches were detected again. In total five nuclear missiles appears to be launched from the US against the Soviet Union. The OKB system was very reliable, it uses spy cameras in satellite constellation, orbiting without problems for years. The data is analyzed in three separate data centers, without any common element.

From a technical point of view, the threat was real. The satellites (very sophisiticated piece of technology for that age) detect five missile launch (not only one), and three isolated data centers  come to the same conclusion. Nothing appears to be wrong in the system… so… what is the logical reaction?

In that point, the human common sense enters in the ecuation. Petrov thought (and keep in your mind that probably this thinking had avoided a nuclear war): “no one start a war with only five missiles”. So he decided not to run the counter attack, not to inform the high military generals and just wait until a radar in the Soviet Union border can confirm o disprove what the systems said.

Fortunately for all of us, Petrov was right. No one start a war with five missiles. The border radars confirmed that there was no missile at all, and everything was a terrible mistake of the system. Specifically, a weird align of the Sun, the satellite and the Earth, provokes that a reflection in some lower clouds appears for the system exactly like a launch does, and this reflection just appear over a military base which, in fact, was nuclear missile silos.

Stanislav Yevgráfovich Petrov in uniform. Credits to unknown author.
Stanislav Yevgráfovich Petrov in uniform. Credits to unknown author.

The point here is: what happened if this decision was made by an automatic process? Probably neither you and me would not be here now.

By the way, Petrov received an award for his help to maintain the peace in 2004, and a prize of (sic) 1000 US dollars. His family died without knowing what really happened this day because the incident was categorized as top secret for years.

Poor man’s containerization

8a10808f854b42568797_010502_Provincial_10-10-11_D1JGMRD_1

Since a few months ago, the containerization of processes becomes in the new virtualization for modern devops.

Of course we are old devops, you know, and nothing special are in containerization that we didn’t use years ago. There are some poor man’s alternative to new tools, like docker or vagrant, but in the old-school way.

The forgotten chroot

Years ago chroot was forgotten for unspecific reasons. The truth is that we can use chroot to create a good way of containerization if we don’t need copy-on-write or network capabilities. This is a very portable way which requires only root privileges, but none special capability enabled in kernel config (very useful for restricted VPS).

You have also a number of non-root alternatives based on ptree, like proot. The use of ptrace is deeper enough to write another article per se. Stay tuned!

LD_PRELOAD

You can do very interesting things with LD_PRELOAD variable. If set the GNU dynamic liker load the library defined in variable in the process context, linking the symbols. So you can override methods like open (2) or write (2). Using this way you can implement a easy-to-use copy on write system which do not require anything special. No root privileges, no special configs in kernel.

Of course there are a number of implementations of this idea. My favorite one is fl-cow, which comes in debian package (officially maintained in Debian and Ubuntu).

unshare

The “new” member of system functions since linux 2.6.16 is the unshare (1) system call, which comes with user space tool unshare (1). The unshare function allow to disassociate parts of the process execution context. That means that you can run a process with different filesystem space for example. It’s very useful when you need to handle mount points for your “containers”.

My favorite tool to handle unshare, clone and others is dive. A tool created by Vitaly Shukela which allows you to run process with different mountpoints, and other capabilities, like cgroups or network namespaces, which will see in next paragraph.

Network namespaces

Since kernel 2.6.24, linux kernel has the ability to create network namespaces. Namespaces is a way to create different network adapters and route tables based in the process context. So you process can handle a “virtual” interface in a simple way.

Scott Lowe wrote some years ago (nothing new here) a really good introduction to namespaces in GNU/Linux using iproute2.

With NS you can easily define a number of hosts with connectivity between them (using loopback) so, your pseudo-containers can use network. It’s very useful when you need to test master-slave configurations.

Conclusions

Of course the containerization is one of most active area in devops today. A lot of good developments like docker are emerging in the horizon, but if you don’t need a more complex systems, this solutions can help you. Furthermore, most of these principles are in the base of how modern containerization systems actually works.