Justin Francis Self-Portrait

Thursday, May 31, 2007

Avoiding Storing Common One-To-Many Relationships

Occasionally, one comes across common classes of objects that you would like to be able to link to a number of different types of domain objects through a one-to-many relationship. For example, in a Customer Relationship Manager tool, you may want to have comments on many different types of entitities (customers, applications, disputes, etc). Same thing for tickets, attachments and other common structures.

Linking up these objects in the domain model is trivial: a list here, an ArrayList there. The real pain comes when you want to persist these relationships. Conventional wisdom holds you would create a link table for each of the entities you want to link to your common entity. The problem is that this is tedious and time consuming because not only do you have to create the tables, you also then need to modify your loading and saving procedures for the domain object. Because this is a one-many relationship, you will be getting multiple rows back per object, with all the muck that brings.

An alternative approach which we have begun to use at work would be to instead store the link to the domain object in the other table (the one side of the relationship). This way, there are no new tables to create. Of course, the immediate problem is that you cannot have a foreign key from one table to many tables. But imagine if instead of storing a database foreign key in the column, you instead store a "programming foreign key". In other words instead of a database link, store a programming link. In python, this would be the `repr` of an object. In this context, we could store a call to a finder function in this column, and evaluate it on the way out.

Storing a finder call in this way allows the common object to link to any number of different types of domain object. The domain objects just need to iterate over the collection of common objects to find those that link to themselves. When you add a new domain object, all you need to do is add a method that searches this extent. No database modification, no saving or loading code to manipulate.

Using some other tricks from python (though this is generally applicable), you could also use the `str` of the object to have a generic display of the common object's link to the domain object. If the protocol you are using (like python's `repr`) is generic enough, you could literally store anything in the column. If we take the comments example, the comment could have been left by a `User`, a string username or a list of `User`s.

There are downsides to this approach, of course. The biggest one is that ad-hoc queries on those common objects becomes more difficult. The good news is those common objects are not normally the subject of ad-hoc queries because they are not interesting domain objects. Additionally, if your system has the facility for ad-hoc queries in a programming language (no, not SQL), then this is a non-issue (depending on who is doing the querying). Secondly, you cannot enforce referential integrity in such a system at the database level. Again, though, I have not run into referential integrity issues on these kinds of common objects. Integrity is usually important for pieces of the domain that must fit together properly or system failure may occur. Finally, performance may also be impacted depending on how intense the finder function is, though these should be fast if you have designed your system well.

In the end, it is a trade-off between ease of addition of new domain objects and ease of querying and data integrity. For me, the database is just a storage system, so I will take advantage of the programming language I am working in, even if I lose some functionality at the data level.

1 comment:

Iain Lowe said...

Note also that a properly motivated DBA could add the appropriate functions to allow you to do the lookup based on what you are calling a "programming key" but which is, in fact, more of a DSL for looking up objects of that type. In that case ad-hoc querying, even in SQL, might not be quite as painful as it would first appear.