Discover more from Jake’s Substack
Code Generation is Terrible; I Love It
Is the repeating generated code “DRY”?
If you haven’t heard of “Don’t Repeat Yourself” or DRY, then I probably haven’t reviewed your code. Don’t write the same code twice. If you need to do the same thing in two places, it’s time to refactor and extract that code. This is a pretty simple concept but it’s also incomplete.
I have worked at companies that didn’t do this. What you end up with is copy/paste blocks in 10 places that all function slightly different. If one has a bug, you are unsure if the next copy/paste block has same bug. You spend your nights debugging why block Y isn’t working right just to find out it was actually block Z.
These days, if I see a request to merge and that merge changes the same line in multiple places, my stomach starts to turn. From time to time, it still happens. The culprit is usually either politics, management, or “the new guy”. Lately, the biggest culprit for duplicating code has been code generators, not people.
My introduction to code generators was in the late 2000s when I experimented with Ruby on Rails. Watching a tutorial “run this command…” and have a full “Hello, World” website up in minutes. This was appealing to me at a time when I was starting new projects every week but never finishing them.
The amazement was short lived, though, when I found out I made a single typo that propagated to 10 different places. Easy to solve: Restart from scratch, doing it right this time. If this had happened, and I had more than a seven-day attention span, the results would have been disaster out.
Another tool Ruby on Rails used was “ActiveRecord” which was my first understanding of dynamic metaprogramming as a concept.
For the uninitiated, metaprogramming is the concept of using a program as data for another program. Reflection is the subset of metaprogramming where a program uses itself as data.
The best example of the dynamic runtime reflection of Ruby on Rails is calling a method similar to “find_by_email” on an Active Record object. It would see the call to a function that didn’t exist and generate it dynamically based off the name and object, so:
‘find’: We want to find a (single) record of this type
‘by’: We want to find it using a field’s value.
‘email’: The field we want to use is ‘email’.
So now, with enough information to make a guess at what the developer is trying to do, the active record can make it happen. Does this seem to be DRY? Absolutely! We’ve now eliminated the entire collection of similar “find” methods down to one function, nested somewhere in a parent object.
This worked well for Ruby on Rails. Many languages even started to copy it because it works and is super easy to use. This only takes two features of the language to work:
Be able to know at run time when a call is being made to a method that doesn’t exist; and
Be able to know at run time what fields are available on an object.
Few features are required, but a few more features are needed to make it useful. First, Ruby on Rails is a dynamically typed language. While not impossible, the question of “How do you check the type safety of a method that doesn’t exist?” comes to mind. Tools like inheritance are also helpful to distribute these dynamic functions to objects.
While I absolutely love this type of dynamic metaprogramming as a hobby, it does not do well in big productions. The two big reasons for this are its speed and its reliability. While it can be done well, it can easily be done wrong. This is true of many things in software development, but terrible dynamic metaprogramming doesn’t show up when you compile your code, it shows up when you run your code. If you’re lucky, it happens when you run it, and not when your code is running on a production server you might not even have access to.
Skip forward about 12 years. I’ve now graduated, I’m out working for some big computer company writing tiny container packages in Go when I start working on an existing package that uses an interesting tool: go-swagger. The basics of it are pretty simple: given an API description, generate the rest of the code necessary such that you can just add in the implementation details.
go-swagger is another example of code generation. The output is executable Go code as the Ruby on Rails example above was Ruby code. The distinction is the input. While Ruby on Rails relied on long bash commands, go-swagger uses a YAML or JSON file for data. With this one minor change, you can now re-run the generation. Changes to the API are likely to happen so this is very helpful.
It wasn’t too long after working with go-swagger that a change to an API had to be made. Dozens of files were being changed because of one minor tweak. Is this DRY? Has code generation just removed the idea of “refactoring when you need to repeat your code” and exacerbated the problem using automation? And more importantly, why was this a good thing?
Coincidentally, around the time I was hunting for bugs in copy/pasted blocks A and Z, I read the seminal book “The Pragmatic Programmer” by Andy Hunt and Dave Thomas. This book is where “DRY” comes from. They actually publish the DRY chapter of their new ‘20th Anniversary’ edition as a sample and I’d definitely recommend reading both it and the book.
Unfortunately, I had missed a critical idea in the chapter about DRY. It’s subtle but it’s even in the definition of DRY.
“Every piece of knowledge must have a single, unambiguous, authoritative representation within a system” — Andy Hunt and Dave Thomas; The Pragmatic Programmer
Do you see it? It doesn’t actually mention code. It’s about knowledge. What does that mean? It means this isn’t about refactoring code. It’s about refactoring knowledge, which is a superset of your code. More specifically, “knowledge” in this case can refer to everything from the SQL you use, the API you’ve designed, the objects you’ve programmed and even the documentation you write.
So how does this apply to our go-swagger generation? Simply put, there is only one “authoritative representation” in the system, and it’s the API documentation. The API documentation should systematically influence the program’s design when possible.
Why doesn’t go-swagger just dynamically read the API description and produce a functioning API at runtime? Go is a very light-weight, performance oriented and statically typed programming language. While it would absolutely be possible of any language, it would not necessarily be useful when considering performance.
Go has evolved into a language of small, sharp tools. Its focus on small directory sized packages helps to make code reusable and easy to read. Packages are statically typed, and “nothing is hidden”. There’s no magical construct methods that happen to perform random tasks when you create objects. Constructors seemed so obvious that the thought of a language without them seemed almost archaic when I started to learn Go.
Code generation provides metaprogramming to a statically typed programming language in a performant way. go-swagger does this using a set of library packages for the parts that don’t change based on API, and a structured set of implementation functions and data models used for the parts that represent your API. This is a great approach as changes to the library packages don’t need to interrupt the implementation details.
For people fluent in Go, the idea of loading a monolithic package for slowly building dynamic functionality like Active Record does is completely asinine. Some ORMs do exist for Go, but you can’t go more than two comments in to a discussion before someone tells you to just use this or that SQL helper and be done with it.
So what about using go-swagger’s YAML to Go generation approach to SQL Database management? Well, it exists. There are a few different packages like sqlingo or xo that allow you to generate Go code from your Database scheme. But what if you wanted to use both this SQL platform and go-swagger together? That wouldn’t be DRY, as now you have two separate bodies of knowledge that can conflict: Your database and your API. I like to think of these as “outside-in” approaches, as outside parts of your program dictate or generate what’s going on inside.
Recently, I began working on a project that involved Kubernetes, and some of the Kubernetes tools generate scaffolding and YAML files. The interesting thing I noticed about these tools was that this was the opposite of the go-swagger approach. They were producing YAML by scanning Go code. You can place “markers” — fancy comments — in your code, and the controller-gen tool reads the comments and related code and produces your YAML.
The controller-gen approach was helpful because it had one “authoritative representation” in the system and that was your code. This was an “inside-out” approach, as the code dictated what the YAML would look like, just by existing as code.
This led me to a thought experiment: What would this “inside-out” authoritative representation approach look like if it was done in its entirety? Well, it would start with some simple Go models mixed with tags, markers, and documentation. The documentation and the model would generate into an API description, and the model would generate into a SQL Database and a set of helper functions.
go-swagger does allow something similar to this. It allows you to have comments in your code that will generate your API description. This approach is very similar but it’s missing one critical piece. It can’t get all of the content from your code. It does require you to duplicate the knowledge about which API routes do what.
As for SQL? I haven’t yet found a tool that does this. I have found plenty of SQL Builders, I’ve found a few SQL to Go generates, but I haven’t found anything to build my SQL for me.
The Pragmatic Programmer — Andy Hunt and Dave Thomas (20th Anniversary Edition)
DRY chapter of The Pragmatic Programmer, ‘20th Anniversary’ Edition