Pysa: An open source static analysis tool to detect and prevent security issues in Python code

Facebook engineering is at it again! Yesterday, Pysa was released, a static analyzer that detects common security issues based on dataflow in Python code.


One flew over the CMS nest

Recently, I was looking for something simple for the more “corporate”-y side of web things. I tried some PHP based CMS. For looks and simplicity I decided to focus on one of the lesser known ones (i.e. not the workhorse that Drupal is).

At first, I tried to setup SSL traffic between a managed MySQL instance and the CMS instance – no easy way to include a certificate, even after digging a bit into the config file. “OK”, I mumbled, “I guess I can live with this and keep my traffic local”.

What killed the deal for me was that, after setting up the DB connection, the CMS decided, and rightly so, to do a self-test. I haven’t touched anything in the filesystem up to this moment, so I was expecting a warning but then I got him by this: “Ensure all the listed files exist and are writable by the server. This normally involves CHMODing them 777”.

I was left speechless. We are in 2020 and software still suggests chmod 777 as part of the installation process. No wonder I promptly deleted the installation, if that is the basic security stance, who knows what evils lurk within?

Sorry my dear PHP CMS of some popularity, I prefer to keep you at arms length from now on.


Adventures with Linux Outline Client and aws-iam-authenticator

Hi all,

below is a small engineering puzzle that I had to solve recently. The essential components:

  • a Linux Laptop (in my case, running the excellent ClearLinux distribution)
  • aws-iam-authenticator
  • Outline client (A shadowSocks client)

The setup was the following. A Kubernetes cluster, a bastion host using Outline as the means to connect and access the cluster. In the ~/.kube/config you can see the following stanza:



exec: apiVersion: args: [“token”, “–cache”, “-i”, “”] command: aws-iam-authenticator


Issuing commands such as kubectl get pods would fail, with a DNS resolution error Outline Client was enabled. The root cause for this was that our setup, UDP traffic was disabled over Outline. However, Outline would take over /etc/resolv.conf and add a options use-vc line, indicating that ALL DNS resolutions should happen over TCP.

aws-iam-authenticator communicates under the hood with and attempts to resolve this hostname using UDP. This does not play well with the existing Outline Client setup and eventually will fail with an i/o timeout along the lines of ->

The easiest way I have found to fix this was the following: modify the routing table AFTER Outline client takes over. For my home network this can look along the lines of:

sudo route add -host gw wlp2s0

and Presto! DNS resolution works again for aws-iam-authenticator and kubectl workflow can proceed as normal. I tried experimenting with

export GODEBUG=netdns=cgo
export GODEBUG=netdns=go

but with both flavors of the resolver, it did not honor the options-vc.

Hope this is helpful to other people! Until next time!


Running Binary Ninja under WSL

These days, I have access to a Windows 10 laptop, as opposed to my usual set of tools. One feature of Windows 10 that I really like is Windows Subsystem for Linux, or WSL for short. In case you have not followed the developments, it is a way to run native Linux executables under Windows, without the need for a full blown virtual machine (again, more info on the link provided). WSL is under constant development and it looks like a promising technology.

If you have been doing software reverse engineering, chances are you have heard about Binary Ninja, a reversing platform (and if not, click on the link ASAP, you are missing out). Binary Ninja can run natively on Windows, OSX and Linux. Provided you have the (cheap!) professional version you can also run it in headless mode. The following post will show you how you can run Binary Ninja in both GUI and headless mode under WSL.

I decided to take advantage of some sales in MSFT Windows Shop so I am using Pengwin, as my WSL distribution, and X410 as my add-on X-Server, for a total cost of $20.00. In case that you do not want to spend any money (free as in beer) or prefer FOSS alternatives (free as in speech), this list will provide a lot of alternatives, but as always YMMV.

BinaryNinja comes as a zip file. Unzip it at a location of your choice, for the sake of discussion I will assume that it is $HOME. Attempting to run it as-is, will result in a error message, similar to the one below:

qt.qpa.xcb: could not connect to display
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem. Available platform plugins are: xcb. [1] 217 abort (core dumped) ./binaryninja/binaryninja

This is due to the fact that the DISPLAY variable is not set by default. Edit your login script’s rc file (I use the excellent zsh so my rc file is ~/.zshrc, for bash it is ~/.bashrc) and add:

export DISPLAY=

retry running and voila! You have GUI Binary Ninja running.

Attempting to run Binary Ninja in headless mode, will give the following error:

ModuleNotFoundError: No module named 'binaryninja'

[EDIT] The canonical solution to this is to go to your Binary Ninja install directory and run the scripts/ Python script. By doing this, a .pth file is added. Solution courtesy of Binary Ninja Slack (which is a great community to join).

one other solution to this is to set PYTHONPATH, if not set already or expand it if set. Edit your login script’s rc file and add the following one liner that takes care of both cases (adjust for your BinaryNinja installation directory):

[[ ! -z "$PYTHONPATH" ]] && export PYTHONPATH=$PYTHONPATH:$HOME/binaryninja/python || export PYTHONPATH=$HOME/binaryninja/python

Running one of the example programs should work now.

Personally, I made the transition to Python3 (and if you need some reasons, have a look here). BinaryNinja by default uses a Python2 provider – to change this to a Python3 provider, as provided by Pengwin, open advanced settings and select:


These are the basic steps for my BinaryNinja/WSL setup. If something is missing, feel free to drop a comment or a line anytime and I will update the post accordingly. Happy reversing!

[EDIT] Important update: update to version 1.1.1689

By updating to the latest stable version, I was greeted with the following error:

qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem. Available platform plugins are: xcb. [1] 63 abort (core dumped) ./binaryninja/binaryninja

In order to debug this properly, you can set the environmental variable QT_DEBUG_PLUGINS to 1, as shown below:


Attempting to rerun the app, we see the following [SNIPPED for brevity:

Got keys from plugin meta data ("xcb")
QFactoryLoader::QFactoryLoader() checking directory path "/home/orly/binaryninja/platforms" …
Cannot load library /home/orly/binaryninja/qt/platforms/ ( cannot open shared object file: No such file or directory)
QLibraryPrivate::loadPlugin failed on "/home/orly/binaryninja/qt/platforms/" : "Cannot load library /home/orly/binaryninja/qt/platforms/ (libxkbco cannot open shared object file: No such file or directory)"
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem

Now we are getting somewhere, as we have the name of the missing dependency.

sudo apt-get install -y libxkbcommon-x11-0

… and BinaryNinja works again!

Book Review Uncategorized

Book Review: Managing Kubernetes

So as 2018 comes to a close soon, one fact can be pointed out: Kubernetes is the winner of the container orchestration frameworks “war”, short lived as it was. The popularity of the project is growing steadily and it is being adopted in a variety of businesses, from the small, but technologically adept startup, to large, multi-department enterprises. The “big-three” cloud providers are offering managed Kubernetes versions and there is a steadily growing ecosystem, taking care of various needs that end-users might have. Having said that, this is reflected in the relevant technical literature were most major publishing houses have circulated Kubernetes related books. The focus of these books is usually migrating an application stack to Kubernetes, or Kubernetes basic building blocks. The book chosen for this review is different: it is aimed towards system engineers (or “DevOps” or “SREs” or whatever term/methodology your organization is using) and strives to provide actionable information on managing Kubernetes clusters. Let’s start with a ToC

  • Introduction
  • An Overview of Kubernetes
  • Kubernetes Architecture
  • The Kubernetes API server
  • Scheduler
  • Installing Kubernetes
  • Authentication and User Management
  • Authorization
  • Admission Control
  • Networking
  • Monitoring Kubernetes
  • Disaster Recovery
  • Extending Kubernetes
  • Conclusions

As it can already be seen from the ToC, this book is a departure from the usual Kubernetes literature – most of the literature out there concerns itself with application deployment and application lifecycle on the cluster. Another interesting fact for this book is that it assumes that you run a self-managed Kubernetes cluster – the peculiarities of running Kubernetes-as-a-service are not covered – however, the knowledge contained in this book will be useful even when troubleshooting managed installations. From the introduction, it becomes evident that the focus of the book is to prepare the reader to be able to respond when things do not work as expected (or plainly go wrong), be able to finetune and optimize clusters and finally to be able to extend the system with custom or new functionality.

Once the objectives are set, the book gives an essential introduction to Kubernetes application building blocks, starting from the bare essentials, like Pods and ReplicaSets and progressing to more advanced topics. However, as discussed, this is not the focus of the book so the section is quite short.

In the next chapter, the basics of Kubernetes architecture (which in itself is a distributed system) are laid down. The chapter is broken down in Concepts, Structure and Components, each of which gets a concise, yet packed with information, treatise. With that out of the way, the book moves on to the Kubernetes API Server, describes in detail its structure (which should not be that unfamiliar to those that have worked with well-designed REST APIs before) and ends with some debugging tips. This is followed by a short section on scheduling, while only a high level overview of the algorithm itself is given, at least we are given a treatment on affinities, taints and the rest.

As stated before, Kubernetes is itself a distributed system. There are more than one way to install it, the book chooses to focus on kubeadm – a sane choice given that quite a few installation tools use kubeadm under the hood.

The next two chapters deal with Authentication and Authorization (A1 and A2 for all you infosec acronym fans). Kubernetes itself supports a few Authentication mechanisms, with client certificates likely to be the most common, and the most important ones are examined. With that out of the way, the book discusses, again in a concise way, Kubernetes authorization and admission control. While authorization has kind of settled down by now, admission control is a rapidly moving target – this is pointed out properly by the authors, who pass along enough knowledge for someone to be able to follow into this topic.

The next chapter dives into one of the most important, yet often overlooked, aspects of Kubernetes: networking. As stated already, Kubernetes is a distributed system and networking between components is one crucial aspect. Again, we get a concise treatment about Container Runtime Interface, Service Discovery and even service meshes get a mention at the end of the chapter.

Running a cluster means you should have an x-ray into the cluster, thus the next cluster is concerned with monitoring a Kubernetes cluster, as well as an introduction to monitoring applications inside the cluster. This should give the operator enough insight about the cluster health at any given moment. In case the cluster health is not nominal, is the topic of the penultimate chapter, namely disaster recovery. When running a distributed system a lot of different aspects of the system can fail and, as an operator, you should be prepared to respond and restore the functionality of the cluster to a well defined state. Certain failure points and tools for recovery are discussed. The book closes off with a chapter on extending Kubernetes, somewhat surprisingly using Javascript as the language for the hands on example.

Overall, this is a must-have purchase if you operate Kubernetes clusters or if you are interested in how it operates under the hood. However, the book tries to cover a lot of material in a relatively short length of 170 pages so it remains concise at all times – the material covered within could easily expand to twice or even thrice the size of the book. However, given that this is a pioneering book and that it is written with a higher level of abstraction that extends its lifetime (something really important when dealing with a rapidly moving target, such as Kubernetes), then I would recommend it, and not just on the basis of the impressive credentials of the authors. I really look forward for this book to be used as a seed for a series of further literature (printed or electronic) dealing more with the operational dimensions of Kubernetes.

Book Review

Book Review: Database Reliability Engineering – Designing and Operating Resilient Database Systems

Hello and welcome to yet another book review. Databases have been called the “killer application of IT” and it is true that in, almost any, computing environment today, one or more databases are in play. Having said that, in-depth knowledge of these database systems used to reside with the DBAs of an organization. Today, with the roles being in flux, if you are an SRE chances are you have to deal with databases, quite often without the luxury of a dedicated DBA. Databases themselves have proliferated as well, with NoSQL paradigm entering the market and various combinations of the CAP theorem in effect, depending on the use case. So, it was about time that a dedicated volume appeared in the market that deals with how to apply SRE principles within a database context. Let’s start with the table of contents:

    1. Introducing Database Reliability Engineering
    1. Service-Level Management
    1. Risk Management
    1. Operational Visibility
    1. Infrastructure Engineering
    1. Infrastructure Management
    1. Backup and Recovery
    1. Release Management
    1. Security
    1. Data Storage, Index and Replication
    1. Datastore Field Guide
    1. A Data Architecture Sampler
  1. Making the case for DBRE

Substituting Reliability Engineer, as opposed to Administrator, gives this book a distinct flavor. REs (be it SREs or DBREs) come from the software domain and strive to apply software engineering principles to the operational domain – eliminating toil as they go. In addition, the cornerstone of RE is interfacing with other domains (software engineering, network engineering and yes, DBA come to mind), thus from the get go, the book stresses the need that, while the technical aspects of the book might already be known to a good DBA, there are organizational and cultural aspects to be considered as well (as in, “tear down these silos”).

The book kicks off with an introduction to the concepts that will be discussed, including a Maslow-like hierarchy of DBRE needs (the authors point out that it is totally fine to move between levels at will/need),moves on introducing traditional SRE concepts such as SLOs and how do they apply to the real world (and more importantly, how do they evolve). Risk and how to manage it gets a treatment, as well as a chapter on how to define operational visibility. No treatise on the subject would be complete without a discussion of the underlying infrastructure concepts. Recent developments such as containerization get fairly accurate and fair coverage, as well as more traditional approaches. Backup and recovery gets an extensive chapter, as this is perhaps the most important topic when dealing with databases (for certain companies, large scale dataloss could mean end of business, period). Release management, including CI/CD for databases is discussed, signifying the application of principles carried over from the software domain to the database world. Once all these topics are discussed, there is a chapter on security, including well-known attacks, such as SQL injection (which we should have gotten rid of by now!) and mitigations, including judicious use of cryptography. These chapters in my opinion form the first dimension on the book, which tends to be quite operations-heavy (and rightly so). The book then makes a foray into more traditional territory, discussing topics such as replication topologies, a datastore field guide, architectural patterns for distributed databases and finally, closes off quite nicely with a chapter on DBRE culture.

Now that we have an overview of the structure of the book (and it is a really well structured book), the big question is “does it deliver?”. In my opinion, yes, the authors keep a nice conversational style in what could have been quite some dry-writing. The authors are well known figures in the SRE (or is it DBRE?) world and the splice the text with quite a few anecdotes and external examples. Also the need for proper visibility and traceability is brought front and center (in fact the notion of establishing SLOs is centered around measurable data), I really liked that touch. The human factor is discussed in a few places in the book, which more often than not tends to be overlooked. Even skimming through the book (or speed-reading it) can yield results, given that there are a lot of visual aids. Another nice touch is that in the discussion of security, DREAD and STRIDE are discussed, which is nice to see these mentioned outside of infosec specific literature. The first chapters, as said before, are ops-heavy and they contain a wealth of information even for seasoned reliability engineers (at the very least as a refresher), while later chapters deal more with data, helping the reader to navigate the ever-increasing sprawl of database solutions.

Overall, I will recommend this book to anyone, regardless of skill level, who has to deal with databases in everyday work. This short review might not really do justice to the book, in every chapter (even the introductory one) there are broad discussion topics that one can have really detailed conversations about. Closing, the approach of the authors to apply Reliability Engineering practices in the database world is a valid one – if the advice and methodology contained in the book is followed, a lot of headaches will be preemptively removed and everybody, engineers, owners and customers will be happy. The book lends itself to repeated readings, be it back-to-back or specific chapters, and I cannot recommend it enough.

Management Translations

Charity Majors: Ο Χαρτης Δικαιωματων (και ευθυνων) του Μηχανικου

The following post was originally written by Charity Majors and is being translated into Greek with her permission. For the original post in English, click here. Additionally, I am NOT a professional translator so, while I did my best, feel free to drop me a line or a comment if something is mistranslated. With these out of the way, let’s go!
Η ισχυς εχει τον δικο της τροπο να ρεει προς τους διευθυντες προσωπικου με τον καιρο, ασχετως ποσες φορες θα επαναλαβεις το “η διευθυνση προσωπικου δεν ειναι προαγωγη αλλα αλλαγη καριερας”.
Ειναι φυσιολογικο, οπως η ροη του νερου προς τα κατω. Οι διευθυντες προσωπικου εχουν προσβαση στις αξιολογησεις και σε αλλες προσωπικες πληροφοριες, απαιτουμενες για να κανουν τη δουλεια τους, και τεινουν να ειναι πιο εμπειροι στην επικοινωνια. Οι διευθυντες προσωπικου διευκολουνουν για πολλες ληψεις αποφασεων και για τη δρομολογηση ανθρωπων, δεδομενων και πραγματων και ειναι πολυ ευκολο να προσπεσουν στο να παιρνουν ολες τις αποφασεις αντι να ενδυναμωνουν τους ανθρωπους να τις παιρνουν αυτοι. Μερικες φορες απλα θες να μοιρασεις εργασιες και να διαταξεις ο καθενας να κανει οπως ειπωθηκε (ε; μονο εγω;;)
Μα αν αφησεις ολη την ισχυ να γλιστρυσει προς του διευθυντες μηχανικων, αρκετα συντομα δεν ειναι και τοσο ωραιο να εισαι μηχανικος. Τωρα εχεις ανθρωπους να γινονται διευθυντες για ολους τους λαθος λογους, η ολοι λενε πως θελουν να γινουν διευθυντες, η μηχανικους απλα να χανουν επαφη και απλα να παραδιδουν την εργασια τους (η να παραιτουνται). Ολοι θελουμε αυτονομια και αντικτυπο, ολοι λαχταρουμε μια θεση στο τραπεζι. Χρειαζεται να εργαστεις σκληροτερα για να κρατησεις αυτες τις θεσεις για μη-διευθυντες.
Ετσι, στο πνευμα των δικαιωματων και ευθυνων του Συνταγματος, εδω ειμαι μερικες απο τις δεσμευσεις που κανουμε προς τους μηχανικους μας στην Honeycomb, και μερικες απο τις προσδοκιες που εχουμε για διευθυντικους και μηχανικους ρολους. Μερικες αντικατοπτριζονται, και αλλες ειναι πολυ διαφορετικες.
(Παρεπιπτοντως, το βρισκω πολυ βοηθητικο να απεικονιζω το οργανογραμμα αναποδα – τοποθετωντας τους διευθυντες κατω απο τις ομαδες, σαν δομη υποστηριξης αντι να ειναι αγκιστρωμενοι απο πανω).


  • Πρεπει να εισαι ελευθερος να βαλεις το κεφαλι κατω και να συγκεντρωθεις, και να εμπιστευεσαι οτι ο διευθυντης σου θα σου υπενθυμισει ευγενικα ποτε χρειαζεσαι (η θα ηθελε να σε συμπεριλαβει).
  • Οι τεχνικες αποφασεις πρεπει να προερχονται απο τους μηχανικους και οχι απο τους διευθυντες
  • Σου αξιζει να ξερεις ποσο καλα αποδιδεις, και να ακους νωρις και συχνα αν δεν ικανοποιεις τις προσδοκιες
  • Το On-Call δεν πρεπει να επιδρα σημαντικα στη ζωη σου, υπνο, η στην υγεια (περα απο το οτι κουβαλας τις συσκευες μαζι σου). Αν επηρεαζει, θα το φτιαξουμε.
  • Οι αξιολογησεις κωδικα πρεπει να εκτελουνται σε λιγοτερο απο 24 ωρες, υπο κανονικες συνθηκες
  • Πρεπει να εχεις ενα μονοπατι καριερας που σε προκαλει και συνεισφερει στους στοχους της προσωπικης σου ζωης, με την στηριξη και καθοδηγηση που χρειαζεσαι για να φτασεις εκει
  • Πρεπει να διαλεγεις τη δικη σου δουλεια, με τη συμβουλη του διευθυντη σου και βασισμενο στους επιχειρηματικους μας στοχους. Δεν ειναι δημοκρατια, αλλα θα εχεις μια φωνη στη διαδικασια σχεδιασμου μας.
  • Πρεπει να μπορεις να κανεις τη δουλεια σου εντος και εκτος γραφειου. Οταν δουλευεις απομακρυσμενα, η ομαδα σου θα επικοινωνει και θα σε στηριζει.


  • Κανε προοδο στα εργα σου καθε εβδομαδα. Να εισαι διαφανης.
  • Κανε προοδο στην καριερα σου καθε τριμηνο. Σπρωξε τα ορια σου.
  • Οικοδομησε μια σχεση εμπιστοσυνης και αμοιβαιας τρωτοτητας με τον διευθυντη σου και την ομαδα μηχανικων και επενδυσε σε αυτη τη σχεση.
  • Γνωριζε που εισαι: ποσο καλα αποδιδεις, ποσο γρηγορα εξελισσεσαι;
  • Ανεπτυξε την τεχνικη σου κριση και τις ηγετικες ικανοτητες. Ανελαβε την ιδιοκτησια και να εισαι αξιολογησιμος για τα μηχανικα αποτελεσματα. Ζητα βοηθεια οποτε χρειαζεσαι και δωσε βοηθεια οποτε σου ζητηθει
  • Δωσε αξιολογηση νωρις και συχνα, λαβε αξιολογηση με χαρη. Εξασκησου στο να λες “οχι” και να ακους “οχι”. Ασε τους ανθρωπους να ανακαλουν και να ξαναπροσπαθουν αν κατι δεν ειπωθει σωστα.
  • Ανελαβε την ιδιοκτησια του χρονου σου και ενεργα διευθετησε το ημερολογιο σου. Ξοδεψε τις πιστωσεις προσοχης προσεκτικα


  • Στρατολογησε, προσελαβε και εκπαιδευσε την ομαδα σου. Καλλιεργησε μια αισθηση ομονοιας και ομαδικοτητας, καθως και πραγματικη συναισθηματικη ασφαλεια
  • Νοιασου για καθε μηχανικο στην ομαδα σου. Υποστηριξε τους στην πορεια της καριερας τους, προσωπικους στοχους, ισορροπια εργασιας/προσωπικης ζωης καθως και δυναμικες εντος και διαμεσου της ομαδας.
  • Αξιολογησε συχνα και νωρις. Λαβε αξιολογηση με χαρη. Παντα πες την σκληρη αληθεια αλλα με αγαπη.
  • Κινησε μας αδυσωπητα μπροστα, προσεχοντας για overengineering και δουλεια που δεν συνεισφερει στους στοχους μας. Εξασφαλισε υπερκαλυψη κρισιμων περιοχων.
  • Ανελαβε την ιδιοκτησια της τριμηνης διαδικασιας σχεδιασμου της ομαδας σου και γινε υπολογος για τους στοχους που θετεις. Διεθεσε πορους με το να επικοινωνεις τις προτεραιοτητες και να στρατολογησεις αρχηγους μηχανικων.
  • Προσθεσε επικεντρο ή αισθηση επειγοντοσυνης οπου χρειαζεται.
  • Ανελαβε την ιδιοκτησια του χρονου και της προσοχης σου. Να εισαι διαθεσιμος. Ενεργα διευθετησε το ημερολογιο σου. Προσπαθησε να μην κανεις τα συναισθηματα σου προβλημα των αλλων (αλλα γειρε προς τον διευθυντη σου και την ομυγηρη σου για υποστηριξη).
  • Δωσε προτεραιοτητα στην δικη σου προσωπικη αναπτυξη και φροντιδα. Υποδειγματοποιησε τις αξιες και τα χαρακτηριστικα που θελουμε τους μηχανικους μας να ακολουθησουν.
  • Μεινε τρωτος.

Θα ηθελα να ακουσω απο οποιονδηποτε αλλο εχει μια λιστα οπως αυτη.

Book Review

Article Review: Containers will not fix your broken culture (and other hard truths)

First things first, if you do not know what is ACM Queue (or even worse, do not know what ACM is), click on the links provided. ACM relatively recently has reformed and now presents articles by industry experts, especially in the Queue magazine (you get an article from Queue with every Communications of the ACM magazine but there is more, much more). (disclaimer – while I am a paying ACM member, I make no profit or have no further affiliation with the organization (i.e. I am not an official Ambassador).
With that out of the way, let’s focus in the article in question. The author is Bridget Kromhout, currently working for Microsoft. The main idea of the article is that solution to difficult, seemingly technical problems, can be best resolved by examining the interactions with others. The main ideas discussed therein are the following

  • Tech is not a panacea
  • Good team interactions: Build, because you can’t buy
  • Tech, like Soylent Green, is made of people
  • Good fences make good neighbors
  • Avoiding sadness-as-a-service

The article is extremely well written. One thing I liked the most is that it includes links to definition you might or might have not heard. The key take away idea of the article is that we tend to think technology and enforce technology rules in an increasingly complex distributed system world, whereas the key is communication between individuals and teams, peers or otherwise. It also coins a phrase that unfortunately will ring true for a lot of people in the audience of this blog “on-call PTSD” and even manages to kill one of my favorite interview questions, and these are only the first two pages. The article also states “we succeed when share responsibility and have agency” – Amen to that, personally I have seen more than a few dysfunctional environments where responsibilities were shrugged off routinely. So to sum it up (and keep this review proportional to the length of the article), Bridget states the value of communication, brings in a ton of references to support her case (making the article simultaneously well research but not falling into the trap of being esoteric) and, at the same time, emphasizes the need of technology. Highly recommended reading!

Book Review

Book Review: The Practice Of Cloud System Administration Volume 2 – Designing And Operating Large Distributed Systems

Hello everyone with another book review. This time, I will be reviewing a book that I consider a classic. As always, let’s start with the list of contents:
Part I Design: Building it

  • Designing in a distributed world
  • Designing for Operations
  • Selecting a Service Platform
  • Application Architectures
  • Design Patterns for Scaling
  • Design Patterns for Resiliency

Part II Operations: Running it

  • Operations in a Distributed World
  • DevOps Culture
  • Service Delivery: The Build Phase
  • Service Delivery: The Deployment Phase
  • Upgrading Live Services
  • Automation
  • Design Documents
  • Oncall
  • Disaster Preparedness
  • Monitoring Fundamentals
  • Monitoring Architecture and Practice
  • Capacity Planning
  • Creating KPIs
  • Operational Excellence

Part III Appendices

  • Assessments
  • The Origins and Future of Distributed Computing and Clouds
  • Scaling Terminology and Concepts
  • Templates and Examples
  • Recommended Reading

overall a bit over 500 beautifully printed pages (as you would come to expect from Addison-Wesley).
As you can see from the ToC, the breadth of information contained in this book is tremendous, every chapter can easily expand into a book on its own (and indeed, there are volumes that expand on a lot of the topics), however this book achieves to give the astute reader a ton of information, heck it is almost like the information is condensed – just add water. The authors do not fell into the pit of sticking with a particular technology, they maintain a level of abstraction, that in my opinion is about right, not too abstract (that would limit the potential of the book to be applied in real world situations) and, yet, not tied to a particular technology (i.e. this book came before container orchestration frameworks became as popular as they are today but you will not notice) that would instantly severely date the book. The format of the book is similar for all chapters, first an attention-grabbing introduction, then a nice discussion of the topic at hand and finally exercises, so the reader can follow up with what has been discussed – most of them are open ended. After all, large scale distributed systems have a common set of characteristics, no matter what the implementation details are or purpose.
The potential audience of this book are both SREs and their managers. In particular, Part II of the book contains a ton of information relevant to both sides of the equation. If you manage SREs, you’d better be at least acquainted with the material and this book is more than a fine introduction. If you need a book on how to use AWS/Azure/GCP or their specifics, this volume will NOT meet your expectations, as discussed this book is more like a framework.
In case, this is not obvious by now, I consider this book a must-read for anyone dealing with modern distributed systems, be it SRE, SWE or Engineering Manager. I cannot praise this book enough, it is extremely well written, in certain cases it goes against the trends and how can you go wrong with a book that considers a zombie outbreak a valid reason for a datacenter outbreak?
Further resources:
Companion Website
Thomas Limoncelli’s Twitter

PS. A book that everybody is recommending (and asking me about it, in a variety of contexts) is Google’s SRE book. If you have not read this book by now, then you can start by going there to enjoy the book in its entirety. While the Google SRE book is an extremely useful resource, and without wanting to create a false dichotomy, it kind of overshadows this volume, which, in my humble opinion is a better choice in certain regards. Specifically, while both books have an strong Google influence (one is coming from Google, the author of the other was a Google SRE), I find that the “Practice of …” is a more focused volume, something perhaps to be expected given that it is written by “only” three authors. So, do yourself a favour, read both books, there is a wealth of information contained therein.


So you wanna be a programmer?

This (almost lost in time by now) article reflects my opinion towards the whole “Become an engineer in 3 months/weeks/whatever” bootcamps and whatnot. While the article has an infosec focus and the list of books is semi-outdated by now, you get the gist …
PS. Obviously I am not ${tenex}