Thursday, January 18, 2024

Colorado-style Green Chile

This is a recipe for Colorado-style (as opposed to New Mexico-style) Green Chile. Or Chli. The choice of spelling seems to be random. Anyway, this is a huge deal in Denver and common across Colorado generally, but outside the Southwest is basically unheard-of.


Source Recipe

https://www.splendidtable.org/recipes/colorado-green-chili
This is the recipe I started from. It's written by a professional so you should probably read it first. I've had to adjust it quite a bit though, mostly due to wildly incorrect cooking times.
 

Equipment

  • Cast iron dutch oven, with lid, 5 quarts or larger, oven-safe to at least 325degF. The lid must fit snugly and not have a vent hole.
  • Oven
  • Food processor
  • Bowls and spoons and such, nothing special


Ingredients

  • 3 pounds (+/- to taste) pork shoulder, cut into small cubes
  • Chiles (see below!), approximately 1 pound processed weight
  • 1 quart chicken broth or stock
  • 1 large onion (white! NOT yellow!!), diced (NB: the source recipe says 2, but that will make the final product far too sweet. We want chiles and pork, not onion, to be the dominant flavors.)
  • 8+ cloves fresh garlic, finely minced
  • 1 tablespoon ground cumin
  • 1 can (14oz) diced tomatoes
  • ¼ cup all-purpose flour, +/- depending on textural preference and how much pork fat you end up with
  • Salt, to taste, but expect to use at least 3 tablespoons
  • 1 Lime, juiced. Additional lime juice, or lime wedges for serving, to taste
  • Patience.


On Chiles…

If you can buy chiles directly from a roaster, do so. They'll be delivered to you in a sack, let them steam for about an hour, then start peeling and seeding. Wear gloves. This is miserable work that will pay off later. Divide your processed chiles into portions (1lb fits nicely in a quart freezer bag) and freeze.

If fresh roasted chiles are not available in your area, or you have run out and it is not the right season, there are three alternatives:

  1. Purchase processed chiles, either locally (probably CO/NM only) or online. This is expensive but you will get the good stuff, and it will be much less work.
  2. Purchase canned chiles. The quality is lower, but generally acceptable, the price is lower, and the convenience is nice. You will not generally have much choice of heat levels (most canned chiles are Mild) but this can be fixed.
    1. At least in Colorado, Costco sells small (4oz?) cans of roasted chiles in flats of 12. I supplement proper roasted chiles with these, but you can use them exclusively if it's what you can get.
  3. Purchase fresh Anaheim chiles and roast them yourself.
    1. See the instructions in the source recipe for the general idea, but I never found things to work out the way they said.
    2. If you go this route, I recommend roasting on a gas (or even charcoal) grill instead of under an oven broiler. Use maximum heat, and turn the chiles when one side gets sufficiently blackened.
    3. After roasting, let the chiles steam in a sealed container or covered with plastic wrap, then process.
    4. It's a good idea to do this whole operation the day before, rather than try to do it while the meat is cooking as in the source recipe. That way lies madness.


What kind of chiles should you get? Hatch or Pueblo if available, otherwise Anaheim as above. Mild chiles are a good choice if you didn't grow up eating green chile, or are generally not a spicy food person. Medium chiles are preferred if you can handle it. Hot chiles will generally produce an unpleasant end product unless you are a capsaicin addict, and should only be used in a blend with milder chiles. Note that the actual spiciness of "Mild", "Medium", etc. chiles will vary season-to-season.

The source recipe suggests adding a few jalapenos in addition to the more substantial chiles. I no longer do this, mostly to simplify the recipe, but it adds a nice flavor and is an easy way to add a little heat. Resist the temptation to add a habanero unless you are making a restaurant-sized batch. Habanero powder, however, is a great way to add heat without changing the flavor or texture. A little goes a long way!

Procedure

Definitely read this all the way through before starting, it's a bit scattered.

The Meat

  1. Cut the pork into small cubes, <1" recommended, but do as you like. If the pork is extremely fatty, you may trim it and discard some of the fat, but a significant amount of fat is desirable. "Not fatty enough" is a more common problem.
  2. Season the cubed pork with salt and a little black pepper (advanced: add other seasonings as desired). Let sit for 30+ minutes.
  3. Place the pork in your dutch oven and cook on medium heat, covered, for 20 minutes, stirring occasionally.
  4. Uncover, and raise the heat to medium-high. Stir often, and cook for an additional 40+ minutes(!).
    1. (While this is going on, see the next section.)
    2. The goal is to cook off all the water, and for the pork to brown in its own fat.
    3. You'll know you're done when most of the pork pieces are browned on all sides. Stop if they start to look excessively crispy, like bacon.
  5. Remove the pork from the pot with a slotted spoon and set aside. There should be a glorious pool of pork fat left in the pot. Placing the pork cubes in a strainer will enable you to recover even more fat.


The Vegetables

  1. Dice the onion and set aside.
  2. Chop half of your chiles into roughly ¼" pieces and set aside.
  3. Other half of the chiles goes in the food processor, along with approximately ¼ of the diced onion.
    1. Pulse to your desired texture. I recommend stopping short of a full puree.
    2. This is one of the main ways to influence texture. Experiment!
  4. Set aside the processed chile/onion slurry in a bowl
  5. Add the can of diced tomatoes to the food processor and pulse to desired texture. This can be comparable to the chile/onion mixture, or chunkier, or smoother… totally up to you.


The Aromatics / Roux

  1. Add the diced onion to the pool of pork fat, and reduce heat to medium
  2. (If your pork was on the lean side and you ended up with insufficient fat to cook the onions in, add some vegetable oil... or bacon grease)
  3. Stir continuously
  4. The source recipe says the onions will be "lightly browned" in 5-7 minutes, which is absurd. A good target time is 15 minutes, though feel free to go longer if you prefer more browning.
  5. Add the flour and keep stirring. This is a major determinant of final texture. There is a substantial difference between ¼ cup and ½ cup in terms of the final result (more = thicker, of course). More flour will require more fat. Experiment!
  6. Cook the flour/onion/fat mixture for around 5 minutes. Roux rules apply: the longer it cooks the darker it will get, which contributes more flavor but less thickening.
  7. Add the garlic and cumin and continue stirring for another 2 minutes or until the aroma just drives you mad
  8. (All the cooking times in this section can be varied to suit your tastes)



Into The Oven

  1. Have your oven preheated to 325degF, with a rack set low in the oven so there's clearance for your pot
  2. To the onion/fat/flour/garlic/cumin mixture, add (in any order):
    1. the broth or stock
    2. the (semi-)pureed chile/onion mixture
    3. the (processed) diced tomatoes
    4. the chopped chiles
    5. the pork
  3. Stir, cover, into the oven for 90 minutes (longer cook times are ok and will further tenderize the pork)


Serving

  • When the cooked chile comes out of the oven, add the juice of 1 lime (adjust to taste) and stir thoroughly
  • Taste and adjust salt. It's not unusual for 2+ tablespoons of salt to be needed at this point, depending on how much the meat was salted and the salt content of your broth or stock.
  • Serve with rice, tortillas, lime wedges, that kind of thing
  • Top a burger with it, or smother fried eggs, be creative! In Denver we put it on everything


Tips

In practice these steps are not performed linearly. The meat takes so long to brown, I use that time to do the "Vegetables" steps so those things are ready. This is also a good time to peel and mince the garlic.

The bottom of your pot will start looking pretty brown and possibly a little burnt by the time the pork is finished browning. Don't worry, the time in the oven will loosen that up and it will contribute to the overall flavor.

My dutch oven is 6.5 quarts which is just barely enough for a double recipe. YMMV.

Restaurant-style green chile tends to be much less meaty than this recipe, but everyone in my family prefers it this way. It's unquestionably more satisfying to eat on its own with a high meat content, whereas restaurants tend to use it more as a sauce. Using less meat, or more broth/stock, are easy and safe tweaks to this recipe.



Wednesday, March 16, 2022

The Legendary Comm Engine

Long ago, at Company S, there was the Comm Engine. This was a powerful piece of software that mediated communication between our system and humans.

It sent a lot of email, both HTML and plain text. Something like 5 million a week just in newsletters, plus who knows how many emails related to product interactions, billing notices, sales campaigns, and so on. It could also send SMS, faxes (lots of faxes), and make phone calls that connected to a TTS/IVR system. Comms could be sent immediately, or scheduled for any arbitrary time in the future.

Emails in particular were enormously sophisticated. The content of a message was produced by rendering a specified template. Data could be passed to the template in the comm record -- the comm engine used database tables as its input -- which was highest performance, but each template was arbitrary code running in our application, and so it could do anything, including look up additional data in our database as it rendered. Not only did this allow us to customize each message to its recipient, but to do so using the most current data possible, since it was directly connected to our single production database which contained all our authoritative customer data.

If you've used a commercial ESP (email service provider), you probably understand how powerful this is. You probably push data to your ESP in a daily batch, or perhaps you update certain data points in real time. However you do it, the data there always gets stale at best and de-synced at worst. And no matter how well you manage it, it will only ever hold a subset of what your system knows about your customers, and juggling the set of fields you upload is a permanent maintenance headache.

We injected our own tracking into these emails. Tracking links are now perfectly common, and no one is surprised to see an email link pointing to Substack, Tinyletterapp, Mailchimp, Marketo, and so on. But way back in the early 2000s, the comm engine was adding URL parameters to links that let us track clicks not just to a template, but to the specific message (which again was a database row) the link was embedded in.

Again if you've used a commercial ESP, and in particular dealt with their analytics data, this ought to blow your mind. While its common to be able to get files of standard email metrics -- sends, opens, clicks, bounces, unsubscribes, spam reports -- that can get you to some useful aggregates, being able to trace the behavior of individual messages is unheard of (outside of transactional email, i.e. Sendgrid). With a commercial ESP you typically can't even connect opens or clicks to a specific send. The comm engine could do not only that, links in messages were themselves database entities so you could look those up and see where they pointed. You could even request a rendered copy of a particular message, even one sent years prior.

No ESP on the market today, that I know of, comes close to offering the features that Company S' comm engine had seventeen years ago. Of course they offer different features that we didn't have, like WYSIWYG template editors, integration with your favorite cloud CRM, stale data, and 6-figure price tags. (Less sarcastically, the commercial ESPs all handle email deliverability for you, which you definitely want, and which Company S handled themselves at great expense.) Tellingly, everyone I know that spent time around the comm engine, and subsequently used commercial ESPs, wishes they had the comm engine. Often they contemplate building their own, but this is invariably shot down as being too costly.

How is this possible? How can there be multiple billion-dollar companies offering email-sending-as-a-service that lack feature parity with an in-house system build by line-of-business Java developers in a flyover state nearly twenty years ago?

A boring part of the answer is that the software market just sucks. Folks building and running software businesses don't really know what their customers want. Finding out is not as simple as "asking them", much as we would like it to be, because customers lie. And even when being truthful, customers often don't really even know what they want, or are mistaken about their own needs. Internal software systems can have these problems too, but they tend to be mitigated somewhat in startups that are trying desperately to survive in tough markets. The developers and the users of the comm engine not only worked under the same roof, but to some extent were the same people, which even further mitigates the difficulty of figuring out the right thing to build.

A slightly less boring part is that the features Company S built into the comm engine were, to a large extent, fairly specific to its idiosyncratic needs. The comm engine could send faxes because Company S' customers wanted faxes. The comm engine had TTS/IVR because smartphones (as we know them) didn't exist yet. The comm engine used JSPs for templating because Company S was a Java shop.

A much less boring part is security. Can you imagine an ESP offering "upload your own code that can do literally anything" as a templating system? Seems unlikely.

But I will assert that the main reason is that the comm engine wasn't subject to the data integration needs that every cloud SaaS product is, and that allowed it to achieve functionality that cloud SaaS cannot match. By this I mean that the comm engine didn't require us to move data to it, because all of our data was always within its reach, nestled safely in our single central database. Any cloud SaaS product, whether it's an ESP, a CRM, a help desk system, a ticket tracker, doesn't have your data in it. 

Your data lives in Your System, whatever that is, presumably something home-grown that runs your business and serves its unique needs. Its where users get created when they register, where orders get created when a customer checks out, where subscription records get created when a customer starts the free trial, and so on. Because data must be physically adjacent to a CPU in order to be processed, making the data in Your System available to be used by cloud SaaS software requires transferring a copy from here to there. There is no way around this. It is a fundamental fact of how computers work.

Moving data is expensive and slow! An ESP has to pay to store whatever you upload to them, and will generally place arbitrary limits on the size and complexity of such data. Those uploads will probably hit API servers that impose harsh rate limits (some older ESPs accept file uploads via FTP, which is actually better!). The more data you want to be able to use in your comms, the more you will have to upload, and the slower this will all get.

Even once you've uploaded your data, the ESP will provide only limited capability to use it in message templates. They may offer a standard templating language like FreeMarker or Handlebars, or they may have made up their own. In either case you're stuck with it, and won't be able to extend it or substitute your own.

So the fundamental advantage that the comm engine had was that it was part of our system, which let it leverage the capabilities of that system. All the data in our database and all the functionality in our application code was available to it. And since we were both its designers and its users, we were able to make it do exactly what we needed it to.

It might be possible to match some of this by building software that customers install and run, like in the old days. I have my doubts, because integrating into someone else's system with no a priori knowledge is a daunting task, but it is strictly possible and I can imagine ways that it could work. But a cloud SaaS product could never do so. And since cloud SaaS is how approximately all software is now built, I predict that Company S' legendary comm engine will never be equaled by commercial software.

Saturday, October 23, 2021

Advent of Code 2020

 Uh dude, wasn't that in like, December?

Yeah, well, I started it in December and then life happened. I just finished today (ed: 09-OCT).

I don't precisely remember why I started. Maybe someone mentioned it in work Slack and it seemed fun? In any case, when I started I couldn't decide what language to use, so I did each problem twice: once in pure SQL for a challenge, and again in Clojure as an exercise in learning the language.

(Spoilers follow)

Code here: https://github.com/slotrans/advent-of-code-2020

Day 1, "Report Repair": Trivial. There might be a non-brute-force solution but for such tiny input, I couldn't be bothered. Looking back at my solutions, my SQL solution was pretty straightforward (and beautifully declarative), but I was flailing in Clojure.

Day 2,  "Password Philosophy": Also pretty trivial. This was months ago now, but I can remember the SQL solution flowing right out of my fingers. My Clojure was improving but still awkward. Parts of it look downright silly in retrospect.

Day 3, "Toboggan Trajectory": Now we're talkin'. This was fun! The first of many grid-based/spatial puzzles. This one highlighted a couple of interesting things about SQL. First, the fact that the input is ordered requires special handling at load time, because relations are not naturally ordered. Second, the way SQL performs iteration implicitly through consuming input rows (and producing output rows) rather than explicitly through looping or recursion. On the Clojure side, my comfort level was clearly improving, with recursion in particular. I hadn't yet stumbled on the most convenient ways of modeling grids.

Day 4, "Passport Processing": This one is mostly about string processing, specifically handling the irregular input format. Parsing the input in SQL was gnarly. Clojure's string and sequence processing really started to shine here. This is one of the few puzzles that resembles what I'd call "business" code.

Day 5, "Binary Boarding": Pretty straightforward once you get past the obfuscation of the input. Postgres' translate function and bit(N) type helped keep the SQL solution relatively compact. Pleased to find Clojure has a reasonably easy way of parsing binary literals (which I then forgot and had to look up again for a later puzzle).

Day 6, "Custom Customs": Again SQL struggles with wrangling the input. This isn't surprising by the way, since typically the structuring of input into nicely labeled and typed fields is done before data ever hits the SQL layer. It's wild how much shorter the Clojure solution is.

Day 7, "Handy Haversacks":  This is where things start getting a little crazy. The structure of the puzzle is inherently recursive which poses a big challenge for SQL, and I remember having to fiddle with my solution quite a bit and re-read the documentation for with recursive several times. For Clojure, I remember having to think for a while about how to structure the input rules as data.

Day 8,  "Handheld Halting": This one really pushed SQL to the limit, requiring (or at least, I used) both recursive CTEs and lateral joins. The Clojure code is much more straightforward. Also I can see this is the first puzzle for which I included a (def sample-input ...) to test my solution against the provided sample in the same way it would run against the full input.

Day 9, "Encoding Error": The SQL solution to part 1 is wonderfully compact, but part 2 required an odd combination of window functions and a lateral join. The Clojure code for this one is fairly unremarkable, though I do cringe a little at (vec (for ...)), apparently I had not yet discovered mapv.

Day 10,  "Adapter Array": Part 1 is fine, whatever, but part 2... this one pissed me off.

Have you ever heard the riddle "why are manhole covers round?" It's a bad riddle, because if you've heard it you know the answer, and if you haven't heard it you have no way of finding the answer, besides experiencing a miraculous logical leap. Puzzle 10 part 2 is like that, because if you've seen the climbing stairs problem, and notice the correspondence with this "joltage" nonsense, the solution will be obvious and easy. If you haven't, then you'll need either the aforementioned logical leap, or end up doing something dumb like I did (i.e. attacking the problem very literally).

I very nearly ragequit after this one.

Day 11, "Seating System": Another grid puzzle, and the first one involving cellular automata! It's also, a bit sadly, where I gave up on writing SQL solutions, as they were becoming increasingly convoluted. Anyway I found it fun, and it served as a nice base to come back to when tackling future cellular automata puzzles.

Day 12, "Rain Risk": Neat puzzle. The most interesting part was figuring out how to model the instructions and then how to interpret them. This is one of those puzzles where part 2 completely upends what you did for part 1.

Day 13, "Shuttle Search": Not counting Day 10, this is the first one I hit a real performance problem on. More specifically, my naive solution was too slow to ever finish. Again I had to look at others' posted solutions for some inspiration. And again, I don't know that I ever would have figured out the proper technique without being shown. Perhaps for that reason, this is the last one I did in-order in December. At this point I skipped ahead to Day 24 (it was already 26-Dec) to see "the end", and then stopped.

Day 14,  "Docking Data": This is where I resumed, on 31-Aug. It was interesting to see how well Clojure had "stuck" to my brain, after having not used it at all for 8 months. Bit masks are fun.

Day 15, "Rambunctious Recitation": I suspect there's a pattern to this one that can radically simplify things, but I couldn't see it so I forged ahead with brute force and literalism, as usual. Interestingly, part 2 is identical to part 1 except for being pushed to an absurdly large number of iterations. My code wasn't fast enough, so I did peruse a few other solutions for ideas. In the end though I just took an educated guess at where the slowdown was, and turned out to be correct.

Day 16, "Ticket Translation": I enjoyed the deductive reasoning aspect of this puzzle. I definitely felt like the solution-finding routine would have been much more naturally expressed in an imperative language with a "while" loop.

Day 17, "Conway Cubes": This was probably my favorite puzzle. It turned out to be helpful that I'd already done Day 24, which has a lot of similarities. The best part was finding that, other than having to re-write my "get adjacent coordinates" function (which I had been stubbornly unrolling by hand rather than expressing with nested loops), modifying my solution from 3 dimensions to 4 for part 2 required only the most trivial code changes.

Day 18, "Operation Order": This was one of the hardest for me. Part 1 I was able to do, but when part 2 introduced precedence rules I got stuck. I spent a day thinking about it and trying a few approaches, but got nowhere. I found a tutorial on writing a simple calculator-interpreter in Python, but it was brutally stateful. I spent some time trying to adapt it to be functional but again got stuck. I ended up implementing a transliteration of the tutorial's code using atoms for mutable state. It's VERY gross. I would still like to see a proper functional solution to this one.

Day 19, "Monster Messages": This was the hardest. It took me 5 days, and careful study of several other solutions. I'm sure that any CS majors who happen to have written a regex engine had no trouble with this, but I sure did. I was able to solve part 1 without too much trouble, but got very confused when my solution didn't work at all for part 2, despite only minor modifications to the problem. Unlike 10 and 13 though, this one didn't make me angry, because it's a fair problem that can legitimately be figured out with no prior knowledge.

Day 20,  "Jurassic Jigsaw": I enjoyed this one a lot, because it had a surprising amount of depth. The code for this one is the longest by far, almost double the next largest. Like 19, it took me 5 days to complete, but happily I didn't need to consult any other solutions. This is another one where data modeling is key. Parts of this would definitely have been easier in an ordinary mutable-state language. Also I finally used reduce comfortably! Twice!

Day 21,  "Allergen Assessment": Effectively a re-hash of Day 16, with deductive reasoning. I infer that there's an "easier" (less general?) solution to just part 1, but I ended up going straight for the full solution so I needed only a trivial amount of extra code to complete part 2. After the early pre-processing steps, I actually worked out the solution "on paper" and then backed into a code solution. Some of the instructions feel like red herrings... the statement "allergens aren't always marked" is highlighted, but as far as I can tell doesn't actually matter?

Day 22, "Crab Combat": Fun with recursion. Getting part 2 right turned out to be very fiddly. I was happy this didn't turn into another performance problem...

Day 23, "Crab Cups": ...but this one sure did! This is another one where the solution to part 1 should in theory work for part 2 without modification. And I'm sure it would have, if I had a few months to wait for the answer. The direct, sequence-splitting-and-reassembling strategy solves the problem very simply but doesn't turn out to be efficient.

I ended up writing the most solutions for this one. The original slow Clojure solution. A second Clojure solution that leaned on mutating a java.util.LinkedList and was significantly faster, but still too slow. Then a pure Java solution based on that, which was faster yet but still around a full day to run 10M turns. The solution I actually got an answer from -- in ~2.5 hours -- was a pure Java solution that (still quite naively) manipulated an array of integer primitives.

Once I had an honestly-earned answer to my very last puzzle (I had already done 24 and 25), I went looking for performance help. I suspected all along that what I needed was the right data structure, and when I saw it, it was one of those forehead-slapping moments. I had never thought of modeling a linked list as a map of value->next_value. Rewriting the code for that model took some time, but my final Clojure solution ran in 83 seconds.

Day 24, "Lobby Layout": Another favorite. I was thrilled to finally have an excuse to learn how to model a hex grid. Working in a weird coordinate system, and dealing with cellular automata rules in a sparse space, made for a super engaging and rewarding problem. As I mentioned earlier, I skipped ahead to this one after 13, which turned out quite nicely since it was a jump from basically the second-worst to the second-best puzzle.

Day 25, "Combo Breaker": A nice little gimme at the end.


A few general observations...

Since the puzzles don't build on each other, the solution to each one can be treated as throwaway code. Most of the usual best practices can be thrown out the window. When I did look at others' solutions, I observed quite a few single-letter variable names and other signs of code optimized for speed-of-writing rather than clarity. You can certainly do that if you want, but I doubt that typing out readable names is what's going to stop you from solving each puzzle in one day.

Many puzzles are like a shadow of a much more general problem, but the speedy path to a solution lies in adopting as many assumptions about the input as possible. Whatever letters or numbers appear or don't appear, maximum lengths of strings or sizes of numbers, any shortcut you can take, take it.

Sometimes though, that can come back to bite you in part 2. The part 1 -> part 2 transition is really the only way in which these puzzles resemble Actual Software Engineering, because it represents an unforeseen change in requirements. In fact, after working through a handful of problems I found myself trying to anticipate what direction part 2 might go in, and weighed the risk/reward of making some particular function more or less general. In practice one can't actually anticipate part 2 with any useful accuracy, but I did find that "knowing there would be change" generally put some gentle pressure on me to write more flexible code in part 1.


On Clojure...

In theory, functional languages are great for these puzzles, because each one is just a computation. No file I/O, no network I/O, no interactive user input. The only side effect needed is a final print-to-the-screen. And it turned out that none of these puzzles truly needed (e.g. for performance) mutable state either. So it's a reasonable proving ground to take a language like this for a spin.

Clojure has a modest learning curve. You can get going with just a few basics, but to solve non-trivial problems you'll very quickly need to familiarize yourself with a large number of tools. As I've argued elsewhere, Clojure's promoters claim it to have a "small" syntax, which is nonsense. There are a bunch of fiddly special characters to learn, and a zillion standard library functions to memorize. While there are certainly a few obscure things you're not likely to need, like cycle or interpose, you are definitely going to need all of (in random order): for, map, doseq, filter, flatten, some?, every?, loop/recur, contains? (but also .contains), cond, first, rest, take, drop (and/or nthrest), into, merge, count, assoc/dissoc, apply, reverse, identity, nil?, key/val, get-in, conj, disj, remove, repeat, range, subvec, nth... and probably some I'm forgetting. On top of that, you'll have to learn a number of idiomatic compositions of these functions, such as making an equivalent to Python's dictionary comprehension using into {} combined with for or map. It's not intractable, but you will flail for a while, and write dumb stuff that will embarrass you later. It will also be difficult to find the right incantations (e.g. on Stack Overflow) because Clojurists unfortunately tend to post about the most "elegant" or theoretically idiomatic solutions, rather than the most comprehensible ones.

One thing that really shines in the language is string processing. The inputs for Advent puzzles are generally supplied as simple text files, and often there's structure that needs decoding. It's joyous that reading a file into a string is as simple as (slurp "filename"). Splitting on delimiters or patterns, iterating over lines or characters, parsing strings into numbers, all very simple, very easy. The function clojure.edn/read-string is particularly lovely to use, as it solves the problem of "treat this string of numerals as if I had typed it in source code", yielding a long or double as appropriate, without needing to be told.

Overall I really enjoyed using the language and hope I someday have an opportunity to use it for something real.

Wednesday, September 29, 2021

Tags Are A Bad Data Model

We've all seen tags, right? Twitter, Instagram, Steam, Stack Overflow, Bandcamp, just about every blogging engine or CMS.... all have tags. Sometimes they seem useful but mostly not. Why is that?

Tags are a fundamentally bad data model, because they offer exactly one extremely weak semantic.

Real quick, what do tags look like? Here's a basic relational implementation. Let's assume we have a post table, and we want to have tags and be able to associate tags to posts.

create table tag
(
  tag_id  serial        primary key
, name    varchar(200)  not null
);

create table post_tag_map
(
  post_tag_map_id  serial  primary key
, post_id          int     not null
, tag_id           int     not null
);

create unique index uidx_posttagmap_postidtagid on post_tag_map(post_id, tag_id) ;

There, those are the absolute basics. You would want other stuff like foreign keys, more indexes, created timestamps, and maybe what user created a tag or association, but this is the core of the model.

What can we do with this? Well, we can enumerate all tags that a post has, so that we can show them. Or, given a tag, we can enumerate all posts that have that tag. And... that's it.

To see how weak the semantic really is, let's imagine doing some basic analytics on our tags. How would we summarize the tagging of posts?

select p.post_id
     , p.name
     , max((t.name = 'fiction')::int) as TAG_FICTION
     , max((t.name = 'mysticism')::int) as TAG_MYSTICISM
     , max((t.name = 'politics')::int) as TAG_POLITICS
     , max((t.name = 'wtf')::int) as TAG_WTF
     , max((t.name = 'statistics')::int) as TAG_STATISTICS
     , max((t.name = 'humor')::int) as TAG_HUMOR
     /* many more... */
  from post p
  left join post_tag_map ptm on(p.post_id = ptm.post_id)
  left join tag t on(ptm.tag_id = t.tag_id)
 where 1=1
 group by p.post_id
        , p.name
;
That's the best we can do. A seemingly-endless bit array of 1/0 (or true/false if you like) flags showing whether any particular post has any particular tag. If new tags are added, we need to adjust our query (and table, if we store these results for easy use).

No tag ever conflicts with any other tag. If we have tags for "red" and "blue" and "green", a post can have all of them. If we have tags for "fiction" and "non-fiction" a post can have both of them. Remember, each one is just a flag, and they are all independent of one another.

In fact, we can describe our original data model a different way...

create table post_tag
(
  post_id         int      primary key
, tag_fiction     boolean  not null
, tag_mysticism   boolean  not null
, tag_politics    boolean  not null
, tag_wtf         boolean  not null
, tag_statistics  boolean  not null
, tag_humor       boolean  not null
/* many more... */
);

 ...where the ability to add columns -- at runtime! -- has been delegated to users, whether that be end-users or admins.

That's all the tags data model is. An infinity of boolean flags. No categories. No hierarchies. No key/value pairs or additional detail. This is all you get.

That is an incredibly weak semantic! It's terrible!

 

It only ever works where you, the system designer, fundamentally have no knowledge of what kind of meaning your users might want to impute to Things in your system and never will.

It works on Twitter because the breadth of topics that get discussed on Twitter is up to Twitter users, and changes constantly. If someone wants to try to create a Schelling point around some topic by using a #hashtag, they can. Maybe it'll catch on and maybe it won't. Maybe the word of phrase they chose is awkward or ambiguous or otherwise fails to communicate meaning. Maybe it's disingenuous or an outright lie. It's (arguably) not up to Twitter to manage this, and given the speed at which trending topics morph there's really no way they could.

It works on Bandcamp because artists choose their tags, and they choose them with purpose in mind: self-identifying with genres or styles, to aid discoverability.

It works on Steam for similar reasons. Users tag games to help other users find games they might like. It helps that users have coalesced around a fairly finite set of popular tags which doesn't change much over time, and the Steam UI highlights only the most popular tags (which in turn relies on having a big, engaged user base).

It probably won't work very well on your blog. Absent any conceptual framework, you'll struggle to think of what tags each post should have. The tags I used in the example are real, taken from a blog I've read for years. They don't make a ton of sense, and have never been useful for... anything.

It definitely won't work within your business software, because you need stronger semantics! You need categories, and hierarchies, and sets of mutually-exclusive/collectively-exhaustive values. Maybe some flags, sure, but specific flags with specific meanings. And all of this needs to be designed by the folks that build the software, not left up to users (at least not by default).

Monday, April 20, 2020

Chocolate Chip Banana Bread

This is AB's banana bread recipe from IJHFMF with a few tiny modifications and comments. I've made it probably 10 times over the last year
 
Ingredients
(dry team)
220g AP flour
35g oat flour, which means 35g oats put through a spice grinder or food processor
1 teaspoon salt
1 teaspoon baking soda

(wet team A)
1 stick unsalted butter, melted and cooled
2 eggs
1 teaspoon vanilla extract (or almond extract etc if you're feeling adventurous)

(wet team B)
4 bananas, extremely overripe, like seriously they will be nearly black, it takes a couple of weeks for them to get this way
180-210g sugar, to taste (original recipe says 210). I like to sub a little brown sugar or honey.
 
(misc)
extra butter for pan
dark(!) chocolate chips (only optional if you don't like being awesome)
chopped nuts (pecans or walnuts)

Note: if you have 6 bananas, or whatever, this recipe can be scaled up, but as written it fills a loaf pan so you will need additional pans.


Tools
kitchen scale (we bake by weight, not volume!)
3 bowls
mixing spoon
electric hand mixer (optional)
loaf pan (mine is about 10"x5"x3") or muffin pan(s)
parchment paper
oven (duh)
cooling rack


Procedure
1. Peel the bananas and pile them in a bowl. This step is first so that you can abort if you find that they're moldy.

2. Pre-heat the oven to 350degF

3. Melt the butter and set aside to cool.

4. Assemble wet team B by adding the sugar to the bananas and mashing/mixing thoroughly. I use a hand mixer.

5. Assemble the dry team. Toast the oats before grinding if you're feeling ambitious.

6. Finish wet team A by adding the eggs and vanilla to the butter and mixing gently. Just break the egg membranes and scramble them a bit. If the butter is hot when you do this it will cook the eggs and that is Bad.

7. Add wet team A to wet team B and mix. Again I use a hand mixer.

8. Add the combined wet team to the dry team. Mix only until combined (meaning no pockets of un-moistened flour). If your bananas were huge, or you used more than 4, the batter may seem too wet. Add a bit of extra flour. Getting this right takes practice.

9. Mix in chocolate chips and/or nuts to taste.

10a. If using a loaf pan, rub the inside with butter and then line with parchment paper (only the long sides need to be papered, the short sides will be touched by the batter and this is fine)
10b. If using a muffin pan, use muffin wrappers (or don't, in which case you're on your own)

11. Pour in your batter. For the loaf pan this is simple. For muffins, I haven't yet figured out how much should go in each one. Best of luck.

12a. For a loaf, bake in the center of the oven at 350 for 45 minutes then raise to 380-400 (experiment) for 15 minutes more (this browns the outside and firms the crust). Ovens vary and you may need to tweak times and temps. When it's done a toothpick will come out not-quite-clean (unlike a cake). If a toothpick comes out totally clean it's probably overbaked and the voice of Paul Hollywood will haunt your dreams.
12b. For muffins, bake for uhhhh less time than that? I haven't gotten them right yet.

13. Cool on a rack for 15 minutes in the pan, then remove from the pan and cool for 60 minutes or until you can't stand to wait any longer.

14. I wrap the loaf tightly in plastic, then foil, and keep it on the counter. It will definitely keep for a week. You will almost certainly eat it all before a week goes by.

Sunday, April 5, 2020

Far Too Many Words About Airflow

Author's note: I recently wrote the below in nearly-unbroken stream-of-consciousness mode targeted at a specific audience of one. It is reproduced here with just a few minor redactions. The subject/prompt was "why I dislike Airflow".
 

Subject: Airflow

 
I've never used Prefect, but they wrote a detailed piece called "Why Not Airflow?" that hits on many of the relevant issues:

In my own experience with Airflow I identified three major issues (some of which are covered at the above link):
1. Scheduling is based on fixed points
(docs here https://airflow.apache.org/docs/stable/scheduler.html look how confusing that is!)
When we think about schedules we naturally think of "when is this thing supposed to run?" It might be at a specific time, or it might be an interval description like "every hour" or "every day at 02:30", but it is almost certainly not "...the job instance is started once the period it covers has ended" or "The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period", as the Airflow docs describe it. Our natural conception of scheduling is future-oriented, whereas Airflow's is past-oriented. One way this manifests is that if I have a "daily" job and it first runs at say, 2020-04-01T11:57:23-06:00 (roughly now), its next run will be at 2020-04-02T11:57:23-06:00. That is effectively never what I want. I want to be able to set up a job to run e.g. daily at 11:00, and then since it's a little after 11:00 right now, kick off a manual run now without impacting that future schedule. Airflow can't do this. They try to paper over their weird notion of scheduling by supporting "@daily", "@hourly", and cron expressions, but these are all translated to their bizarre internal interval concept.

(Counterpoint: their schedule model does give rise to built-in backfill support, which is cool)

2. Schedules are optimized for machines, not humans
[Upfront weird bias note: I am cursed to trip over every timezone bug present in any system I use. As a result I have become very picky and opinionated about timezone handling.]

We run jobs on a schedule because of human concerns, not machine concerns. Any system that forces humans to bear the load of thinking about the gnarly details of time rather than making the machine do it, is not well designed. Originally, Airflow would only run in UTC. By now they've added support for running in other timezones but they still do not support DST, which basically means they don't actually support timezones. Now, standardizing on UTC certainly makes sense for some use cases at some firms, but for any firm headquartered in the US which mainly does business in the US, DST is a reality that affects humans and that means we have to deal with it. If we deny that, we're going to have problems. For example if I run a job at 05:00 UTC-7 a.k.a Mountain Standard Time, chosen such that it will complete and make data available by 08:00 UTC-7 when employees start arriving to work, I am setting myself up for problems every March when my employees change their clocks and start showing up at 08:00 UTC-6 (which is 07:00 UTC-7!) because they are now on Mountain Daylight Time. If I insist on scheduling in UTC or a fixed UTC offset, I am probably going to have to move half my schedules twice a year. That's crazy! Computers can do this for us!

3. DAGs cannot be dynamic
At the time I was seriously evaluating Airflow at [previous employer], this is what killed it.

A powerful technique in software design is to make our code data-driven. We don't often use that term, but it's a common technique, in fact so common we don't much notice it anymore. The simple way to think of this is I should be able to make my software do things by giving it new input rather than writing new code.

Consider a page like this one (from a former employer): https://shop.example.com/category-slug-foo/product-slug-bar/60774 [link removed, use your imagination]
No doubt you've been to thousands of such pages in your life as an internet user. And as an engineer, you know how they work. See that 60774 at the end? That's an ID, and we can infer that a request router will match against this URL, pull off that ID, and look it up in a database. The results of that lookup will be fed into a template, and the result of that template rendering will be the page that we see. In this way, one request handler and one template can render any product in the system, and the consequence of that is that adding new products requires only that we add data.

Airflow doesn't work this way!

In Airflow's marketing material (for lack of a better term), they say that you build up your DAG with code, and that this is better than specifying static configuration. What they don't tell you is that your DAG-constructing code is expected to evaluate to the same result every time. In order to change the shape of your DAG you must release new code. Sometimes this arguably makes sense. If my DAG at v1 is A -> B, and I change it in v2 to be A -> B -> C, perhaps it makes sense for that to be a new thing, or a new version of a thing. But what if my DAG is A -> B -> C, and I want to parallelize B, perhaps over an unpredictable number of input file chunks, as in A -> {B0, B1, ..., Bn} -> C where N is unknown until runtime? Airflow doesn't allow this, because again our DAG construction code must evaluate to the same shape every run. This means that if we want data to drive our code, that data must be stored inline with the code and we must re-deploy our code whenever that data changes.

This is not good. I have built multiple flows using Luigi that expand at runtime to thousands of dynamically-constructed task nodes, and whose behavior could be adjusted between runs by adding/changing rows in a table. These flows cannot be expressed in Airflow. You will find posts suggesting the contrary (e.g. https://towardsdatascience.com/creating-a-dynamic-dag-using-apache-airflow-a7a6f3c434f3) but note what is going on here: configuration is being fed to the DAG code but that configuration is stored with the code and changing it requires a code push. If you can't feed it input without a code push, it's not dynamic.


Airflow and the team at Airbnb that built it deserve a lot of credit for popularizing the concept of DAG-oriented structuring of data jobs in a way that Luigi (which predates it by years) failed to do. The slick UI, built-in scheduler, and built-in job executor are likewise praiseworthy. Ultimately though I've found that tightly coupling your flow structure to your scheduling system is a mis-feature. The fact that Luigi jobs must be initiated by an outside force is actually a powerful simplification: it means that a Luigi program is just a program which can be run from anywhere and does not (necessarily) require complex execution infrastructure. (Prefect can be used in this way as well, or with its own supplied scheduler.)

I also concede that there is value in wholesale adoption of Airflow (or something like it) as the central unifying structure of one's data wrangling universe. Regardless of the specific tech, having a single central scheduler is a great idea, because it makes the answers to "where is X scheduled?" or "is there anything that runs at time Y?" trivial to find. What's worrisome about Airflow specifically in that role is all the things it prevents you from doing, or allows only through dirty hacks like writing DAGs that use Luigi internally, or using code-generation to push dynamism to "build time".

Lastly, I have to concede that Airflow's sheer popularity is a vote in its favor. There's a lot of enthusiasm and momentum behind it, which bodes well for future feature additions and so on. There are already even managed Airflow-as-a-service products like Astronomer. I think it's still early, though. I've had a serious interest in dependency-structured data workflows since at least 2007, and until I encountered Luigi in 2014 I was aware of zero products that addressed this need, other than giant commercial monsters like Informatica. There's still a great deal of room for innovation and new players in this space.
 
[Original rant concludes here.]

Addendum

For whatever reason this topic keeps turning over in my head, so here are even more words.
 
I recently interviewed with 4 companies, of which 2 are using Airflow and a third is/was planning to adopt it. My current employer also uses it. I have little idea if any of them are actually happy with it, or if they understand the value and/or struggles it's creating for them.

Workflow / pipeline structuring is far from a solved problem. As I noted in my rant, the problem has existed basically forever, but general solutions -- platforms, frameworks -- have only started popping up in the last decade or so (again deliberately ignoring Informatica et al). There seems to be a temptation in the industry to treat Airflow as the de facto standard solution just because it's popular and appears to have a slick UI (the UI is actually clunky as hell, which you will discover within 30 seconds of trying to use it).

The options in this space by my reckoning are:
  • Luigi (Spotify, 2012)
  • Drake (Factual, 2013)
  • Airflow (Airbnb, 2015)
  • Dagster (dagster.io, 2018)
  • Prefect (prefect.io, 2019)
  • AWS Step Functions (AWS, 2016)
  • chaining jobs in Jenkins (2011?)
  • miscellaneous proprietary shit
These are all very different from each other! This is not a choice like React vs Vue, Flask vs FastAPI, Dropwizard vs Spring Boot, AWS vs GCP vs Azure, or [pick your favorite].
 
These tools aren't even all the same kind of thing. Luigi is a library, Drake is a CLI tool, Airflow and Prefect are libraries and schedulers and distributed task executors, Step Functions is a managed service, and Jenkins is a full server process and plugin ecosystem nominally intended for doing software builds.
 
They also differ markedly in how they model a workflow/pipeline. Luigi has Tasks which produce Targets and depend on other Targets, a design which almost fully externalizes state, with the result that Tasks whose outputs already exist will not be run, even if you ask them to. Airflow tightly couples its tracking of Task state to a server process + database, and requires Tasks to have an execution_date, so whether a Task will run if you ask it to depends on whether the server thinks it has or has not already run for the specified date. Drake, like make, uses output file timestamps to determine what needs to run. Jenkins just does whatever you tell it to (unless some plugin makes it work completely different!).

We don't even have standardized language for talking about these things. Several of these tools use the word "task" to name a major concept in their model. They're usually similar but hardly interchangeable. Perhaps a better example is trying to talk about "dynamic" DAGs, like I mentioned in my rant. I mean something very specific when I say a DAG is dynamic: that the shape of the execution graph is determined at runtime. Other people describe DAGs as dynamic simply because they were assembled by running code rather than specified as configuration data. These definitions are apples and oranges, and the result is a great deal of confusion in discussions of capabilities and alternatives, particularly in the very limited space of public conversation.

I encourage everyone to go out and try this stuff. Build a trivial, dummy pipeline and implement it in 3+ tools. Then repeat that exercise with a small pipeline that does real work and can stand in for the kind of problem you typically tackle. Then start building a solution to a serious problem. You don't have to build the whole thing. If you've gotten this far, simply writing stubbed-out functions/classes and focusing on how they wire up will tell you a great deal. Tasks that sleep for a random time and then touch a file or insert a row are often all you need to simulate your entire data processing world. As a final step think and work through what happens as you change things. Most of these tools don't discuss their implied deployment models, and the devil is in the details.

The bottom line for me is that this remains an active research area, even though I've been working on it for over a decade. I've learned quite a bit in that time but my wisdom remains dwarfed by my ignorance. Don't believe anyone who's trying to tell you that we have this figured out.

Sunday, January 26, 2020

Startups Are (Mostly) A Bad Deal For Employees

I had been kicking around the idea of writing a post about how working for a startup is Actually Not Great for most employees, but Dan Luu wrote it for me so just read his instead:
https://danluu.com/startup-tradeoffs/

Sunday, December 15, 2019

A Different Take on The Failings of Open Source Software

http://marktarver.com/thecathedralandthebizarre.html

The above is a very good article describing why most of the promises and predictions made in ESR's The Cathedral and the Bazaar (1998,) about the coming triumph of open source over closed source, failed to come true. As someone who read TCatB and fell for its promises -- in 1998, no less -- I cannot help but nod along to the arguments presented. It's helpful to go back to the early promises because while in 2019 open source may feel dominant, it has nevertheless clearly failed to live up to the original hype.

If anything, the author isn't critical enough when it comes to the claim that the world would converge on optimal solutions and avoid duplicate effort. I've got about ten thousand Javascript frameworks waiting outside to have a word with that one.

But I want to come at it from a different angle, and make what may be a novel criticism of open source.

Back in the day, if you wanted software to run your business on, you had to pay for it. Even if you only need the tools, to allow you to build your own software with which to run your business, you still had to pay for it. Operating systems, compilers, databases: money, money, money. The obvious consequence was that running a business on software was rather expensive. (This world sort of still exists, in Microsoft shops.)

But there was a secondary consequence that made things even more expensive than that: you needed to hire people that knew how to use and take care of all this software. This feels a bit like adding insult to injury, but it was actually something that businesses didn't mind at all, because it fit their existing mental model of how industrial systems worked. You see, if I had a factory, it was obviously necessary to hire people to work in it. If I bought a Bridgeport mill, it was only sensible that I needed to hire a machinist to run it. A crane needed an operator, a forklift needed a driver, and so on. Capital was never sufficient on its own, it was always necessary to add labor to get output.

So even as late as 1999, if you forked out a few million dollars a year for Oracle Database, it only made sense to spend a few hundred thousand extra employing a couple of professional Oracle DBAs. Likewise if you had a fleet of Windows NT servers in the racks, you would have a team of administrators trained (and likely Certified) on Microsoft software to look after them. And so it went for all the large proprietary business software vendors.

Then along comes open source software and... it costs nothing. Oh sure, the message is "free as in speech" not "free as in beer", but in practice it's all priced at $0 and if we're honest that's a big part of the attraction. An interesting thing happens psychologically. Paying $100k/yr for a professional DBA to support a $1M/yr Oracle installation feels very reasonable. Paying $100k/yr for a professional DBA to support a $0/yr MySQL installation... somehow does not.

There's another phenomenon developing right around the same time that reinforces this: the amateurization of business software development. Used to be one needed all this expensive software (and hardware!) to get a tech business off the ground. Then suddenly all you need is a cheap x86 server plus the zero-dollar LAMP stack and you're off to the races. For a while it was easy to dismiss this approach as the domain of hobbyists, but then the hobbyists starting launching successful businesses with it, forcing the entire industry to take it seriously. I say "amateurization" because the key driver here was the availability of free (as in beer) software that ran on cheap hardware, which allowed motivated hackers to get experience doing stuff without training, certifications, mentorship, or even (in many cases) college.

This deeply affected the culture of tech companies. In the proprietary high-dollar era, a developer was happy to enlist the help of a DBA, because the DBA was the expert on the database. The DBA was happy to enlist the help of the SysAdmin, becuase the SysAdmin was the expert on the OS and hardware. The SysAdmin was happy to enlist the help of the Network Admin... and so on. In the LAMP era, it's just four guys in a garage, and they all have to do everything just good enough to ship. The hardware, OS, network, database, compiler suite, various server software, and everything else is easy enough to procure, install, and configure that any motivated hacker can do it. There's neither a need nor time for specialized professionals.

This in turn has deeply affected the career development of technologists. Oracle DBA and Microsoft Server Admin used to be stable, high-paying jobs with long-term career prospects. Satellite firms built businesses around selling tools to these folks. These career-slash-cultures had their own conferences, newsletters, even glossy monthly magazines. Almost all of that is absent from the open source world. Do you know anyone who got training on how to install Linux? Anyone who's made a career out of MySQL administration? Someone certified on nginx?

I think it's been about 20 years since this evolution got going in earnest, so it seems reasonable to take a look back, as the author of the opening link did, and ask where it's gotten us.

In the "pro" column, it's a hell of a lot easier to start a company than it ever has been. If you have an idea and the drive to pursue it, it's never been cheaper or easier to try giving it a go.

In the "con" column, we have a systematic loss of expertise and deep understanding. We assume now that any piece of software that's no further away than apt-get install should be something we can run professionally, in production, with real money on the line, with no training, no practice, hell maybe not even a skim of the documentation.

Tuesday, November 26, 2019

Kubernetes is Anti-DevOps

(bias warning: I think Kubernetes is basically cancer)

So over the last 10 years or so there's been this whole DevOps... movement... thing. The industry got the idea into its collective head that developers should participate in the operation of the software that they build, that operators should adopt practices from development like using source control, and that in general development and operations should work more closely. In the limiting/idealized case, developers are the operators and there's no organizational separation at all.

In general this was a good idea! Developers who have ops responsibilities build software that is more operable, and operators became more effective by using source control and programming languages better than /bin/bash.

There has also been a lot of pretense and bullshit. Companies have undertaken "DevOps transformations" and crap like that which ultimately accomplished nothing at all. System Administrators have had their titled changed to "DevOps Engineers" with zero change in responsibilities or organizational structure. Any company that uses the cloud has declared that they "do DevOps" and left it at that.

And then there's Kubernetes.

Kubernetes runs containers, containers means Docker, and Docker is super DevOps, right?? Yeah, about that...

An interesting thing about containers is they simplify deployment. If I have a Python program that depends on a specific interpreter version and specific versions of a few dozen libraries, I need to be able to manage the deployment target to make sure all that important stuff is there. But if I package it in a container, now it's a singular artifact that I can plop down anywhere and just docker run that action (so the theory goes, anyway).

That simplified deployment can act as an inter-organizational interface. This is a fancy way of saying that it enables that practice we all just decided was bad: developers throwing their code over the wall to be ops' problem. And once they start doing that, the next step is overly complicated systems of containers with elaborate dependencies on each other, and now you need "container orchestration".

Kubernetes thus becomes the icing on the anti-DevOps cake. In theory it enables all this flexibility and solves all these container orchestration problems (that, it should be noted, we didn't have at all just 5 short years ago). In reality it's a hyper-complex operations layer that requires a handful of specialists in order to use at all.

Kubernetes does nothing for the developer, but nor does it hurt the developer. Being just an execution substrate, Kubernetes is irrelevant to the developer. Thus in their ordinary course of business, a developer would have no need to learn and understand how it works. Nor would it be efficient for them to do so, given Kubernetes' off-the-charts complexity. It's reasonable for, say, a Java developer to learn how to manage the JVM as a runtime and what it takes to deploy applications with it. By comparison, learning Kubernetes is like learning how to run an entire private cloud: simply not something it's worth a developer's time to do.

So ultimately, adopting Kubernetes is about the most anti-DevOps move you could make as a software organization. The wall between dev and ops that we've spent the last decade tearing down is going right back up, and we'll set about throwing our code over it. Enjoy!

This all doesn't make the argument as clearly as I would like but hey this is my blog and I get to rant if I want.