Does object orientation really work?

The vast majority of software that people write inside a business talks to a database. Yet object orientated programming does not gel well with databases. The code that looks clean and simple in a modern object orientated program looks positively terrible when you look at the traffic that code causes at the database.

It's called the object-relational impedence mismatch. The often aired solution is to use some sort of persistence framework to map your classes to a collection of database tables. There are a variety of packages that do this: NHibenate, Gentle.Net, JPersist etc.

The problem is that none of them work too well. The writers of these packages are much like the inventor of this machine:

Perpetual Motion Machine

Solving the impedance mismatch problem is the computer science equivalent of the perpetual motion machine. You can't get it to work because it's impossible; they are simply incompatible ways of working.

A database is essentially a truth reasoning engine. You take a collection of facts, a series of inference rules and you use these to ask it questions. A object orientated language is essentially about setting up classes of object; templates, if you like, from which instances are created. These instances may be related to each other in a UML diagram but this is not enforced strictly in the code. You certainly can't do reasoning on them in the same way that you might a collection of tables. Moreover, what does inheritance really mean to a database?

When you try and make the square peg match the round hole, what you end up getting is a really leaky abstraction. The persistence framework writers would love you to believe that you can "forget" about the database and just write lovely object orientated code free of any worry about the vagaries of how a database works. For a while, they might be right but you can bet your bottom dollar that one day that database is going to bitch slap you in some way. Whether it's a particular query running very slow, a bone-headed auto-generated schema or trying to understand why databases don't understand inheritance - it's going to bitch slap you and when it does it'll ask you for interest you owe it for not treating it with the proper respect.

There are people who acknowledge this problem and propose that the solution is an object database. The problem with this solution is that while object databases are fast for only a very narrow selection of queries. This is because the way they work is via pointers from one object to another. Traversing these pointers can become a bottleneck under more general use-cases.

My solution is a bit more radical. My view is that the database, not the programming language, is the primary tool in solving business problems. The way to solve the impedance mismatch is to ditch object orientation and follow the conventions of the database.

A case in point, no impedance mismatch exists in C because it is not object orientated. We programmed with procedural languages for decades before object orientation turned up - it's not like programming is suddenly vastly harder when we remove object-orientation.

Actually, I'd go further. My admittedly heretical view is that object orientated design is a profound waste of time. I don't believe that object databases are the right answer because I don't think object orientation is the right way to design a program that talks to a database. It takes a colossal amount of object orientated code to achieve anything useful and when it does so, it usually punishes the very thing that's the hardest, most expensive, to scale: the database.

Think about this for a second, you have to work out how you're going to decompose your program in to classes then you're going to have to type out those classes (or if you're bit smarter you have a code generation tool to do this). Then you need to write tests the exercise that object model; large parts of which you probably won't really re-use in other scenarios.

Honestly, what's so wrong with just manipulating the data directly? What's wrong with taking the data from a form and doing a straight insert?

I think it comes from the fact that people think the object model is more maintainable and more reusable.

Is it really more maintainable? If you change the database you're going to have to update your object model in all but the most trivial of changes. Ahh, you say but a change is business logic is easier in the object model! Is that really true? A change in business logic implies a change in a database transition. It has to in order for that change to be recorded anywhere. Who's to say that updating a direct access call is going to be any more difficult than updating the object model?

Secondly, you should only make reusable components you actually do reuse. An object model may be flexible but does every problem need that flexibility? Is it worth spending all that time to create that flexibility where no flexibility is required?

This has a real cost too, in Robert Glass' "Facts and Fallacies of Software Engineering", Fact 18 says this:

There are two "rules of three" in reuse: (a) It is three times more difficult to build a reusable component as a single use component. and (b) a reusable component should be tried out in three different applications before it will be sufficiently generate to accept into a reuse library.

Building a flexible object model will probably cost you more money per unit of functionality. Think about that!

That aside, there's a deeper point here. For a computation to have any impact on the world, it must mutate the state of the machine it's running on. The goal of computation, therefore, is to mutate the state of the machine in just the right way under just the right conditions to record useful information. In real world business programming the goal of a program is ultimately to mutate the database in just the right way under just the right conditions. The program does not exist to be a self-serving statue to object orientated perfection - it exists to manage state transitions in the database.

So why not use the tools the database gives us natively to do this job? Not only does it provide answers to really complicated queries quickly but it provide transaction support, views of the data, the ability to set complex rules to enforce constraints on entries. Taken together, these are features you'd be simply crazy to ignore. They're there to help us write solid applications. They're there to make our work easier!

And yet people do ignore it! I honestly think it's because most developers hate SQL. They've got this pathological desire to expunge every last SQL statement from their program. To me, it's absolutely necessary to understand this language. Just like you need to understand C to understand how high level languages talk to the low level hardware, you need a deep understanding of SQL to understand how databases represent information. You just do, it's not negotiable.

It's a problem domain specific language like XPath or Regular Expressions. No-one would dream of writing their own XML navigation language or their own pattern matching language. We already have well developed, ubiquitous mini-languages that solve these problems extremely well. If you try and solve this problem independently, it is guaranteed to be less flexible and less useful. Yet when it comes to the problem of how to get data in to and out of your application, we invest a huge amount of time in solving problems already solved completely by using the database in the way it's intended to be used.

If object orientation is the casualty of giving proper respect to the database, then so be it.

2008-07-12 16:48:09 GMT | #Programming | Permalink
XML View Previous Posts