Piercing the Data Layer with Code Generation

When people think about dealing with a data or persistence layer, they are often thinking about a database, and when they think about a database they often are thinking about an Object Relational Mapping (ORM) tool.

ORM tools are great, if you're in a position to use them. In particular ORM tools are great with green field development, rapid prototyping, small scale projects, or on teams without the need for dedicated database expertise. With some additional ORM expertise, larger, more database performance sensitive projects can also be tackled.

But what happens when you have an existing, large, and very database performance sensitive code base? One with hundreds of existing stored procedure (Sproc) calls to access the data layer. A traditional ORM tool may not be the right approach. ORM's often have a code first, generate SQL mentality. Your app is already in SQL, how can we get the code to talk to it?

Enter code generation. The modern web programmer is already experienced at least in part with code that writes code. Either by accident or design, code ends up in the view layer that is modified by other code statements. This same type of view engine can be used to write more than dynamically shifting client-side javascript. You can also generate code in whatever server-side language you'd like.

To do this you might do some type of basic reflection and string manipulation developed in an internal tool. Web frameworks give an example of how to create templates and might be repurposed for the task. On the other hand you might find a tool like CodeSmith which is built specifically for the task of code generation.

There are many benefits from automating this process. Beyond not having to write simple wrappers for every new or updated sproc, code generation can allow you to create a standardized interface to your database. Under this interface an implementation can be auto-generated to call the underlying Sproc. Having a pre-existing interface to code against greatly simplifies adding additional unit tests to your project, instead of having to hand roll one specifically for testing. It also gives you a central point of control outside of your database to consider other persistence options.

In particular, I like to think of by data layer as a single API. Everything behind it should be a black box to the code. In practice, this has some problems. As multiple developers work against a shared database, changes have to be propagated out as the interface changes. Local database development would be a solution, but comes with its own challenges. A potential solution is to make partial interfaces and classes, one for each sproc so that changes can be taken care of a bit more piecemeal. Partial interfaces and classes however do seem to come with a substantial performance hit to autocompletion tools and some compilers.

A separate but related issue is the addition of additional frameworks or languages. An API interpretation would suggest the same methods are available to each. Developers might however want to keep data layer access code limited on a per technology basis. For example if the GetPersons sproc is used in Java section of a project, but not the Python sections, the corresponding Python data layer access code might omit the GetPersons sproc. Partial interfaces help facilitate some flexibility in this regard.

I hope this article has given you some things to think about when tackling your data layer on your next project. ORM tools solve a code generation problem, but sometimes from the wrong side of the data layer. Make sure to consider other code generation tools when you got SQL written first.