Nov 25, 2013

Repository Pattern Hell (Part 1): Disillusionment

The older I get the more I dislike the oft venerated Repository Pattern — at least as it is usually implemented and suggested. In fact, I’ve gone through just about every attitude, from ignorance, indifference, mild interest, eager learner, ardent supporter, frustrated friend, disillusioned, and now to somewhere close to outright disdain. Yep, I’ve become a hater, which isn’t something I like to be about anything… but here I am.

Here’s the typical scenario I’m talking about when I say the “repository pattern”:

interface IRepository
    object Find(int id);
    void Add(object item);
    void Remove(int id);
    void Remove(object item);
    void Update(object item);

Or something along those lines. Some languages, like C#, let you do a Generic Repository (IRepository<T> for instance) and then at least have some concrete types in place of just object. There are many other slight variations but the basic idea remains the same. It is a class for manipulating data objects without having to know how or where they are stored. So in practice, you’ll likely create a DbRepository with method implementations that access your database to retrieve and save objects.

The goals are noble and what originally attracted me to it. I started, like I imagine most coders, being quite naïve about best practices. I did a lot of bad stuff with DataSets and other bizarre DAL creations. I was eager to learn better ways, both to improve my skill as well as to solve all of the headaches associated with working with data layers. Which got me Googling and reading, inevitably on the repository pattern.

It looked like a lot of boilerplate up-front work, but the folks that constantly beat the drum and sang the praises had me convinced. I was slow to actually adopt it, as it does take discipline and dedication (as well as a new project; I was not about to re-write an entire working DAL, however monstrous, in an ongoing project).

It decoupled the implementation details from the many other classes that needed data but shouldn’t be accessing databases directly. It made things more testable, something else I was slowly warming up to and trying to incorporate. It even just read better: what could be simpler than add, update, remove, and find?

Lately, though, I had been getting frustrated with things. It begins simply when you realize you lose most of the nifty features that modern ORMs have been adding: lazy loading, change tracking, connection pooling, foreign-key fix-ups, and so on. Well, you don’t necessarily lose these always, it does depend on your repository, but it becomes difficult if not impossible to actually use them or you have to re-implement the features in the repository itself. All in the name of abstracting things. Well, yes, abstractions can be good, but it also means that nobody else can know about these specific features since the interface must be a “one size fits all”. A lowest common denominator of all possible data access implementations.

What usually happens is people write repository classes and use Entity Framework and then either lose a ton of features or they use them anyway breaking the abstraction by “knowing” that in this case it’ll work. It’s rare that anyone ever actually rips out EF and replaces it with NHibernate or something so the chances of them running into problems with the many times they dodged the abstraction is slim to none.

Except when it does crop up. Case in point. Somebody at some point innocently changed some code to make ToListAsync calls. Turns out, this is an Entity Framework specific extension method on IQueryable. Well, it worked, because the code is set up to use an EF-backed repository implementation. But, of course, if you change it to anything else (as I stupidly did), it breaks immediately.

I believe it does matter, a lot, how you’re getting data. That doesn’t mean you can’t still abstract things somewhat but not so far that you lose all knowledge of what’s going on. It matters that the repo is a database or other remote resource because that has async/threading implications sometimes, whereas an in-memory one would not have the same disk/network I/O or CPU considerations.

So what do you do in this specific case? Changing it back to ToList wrapped in a Task.Run, as I naively knee-jerk suggested, makes it async but it bypasses the true EF async implementation. Presumably, this was the desired optimization, which is then not used. Other than that, what? Do we change the IRepository interface to add explicit XxxAsync methods and force all implementers to provide async calls (or pretend to, in the in-memory implementation case)? I don’t have a good answer other than: guess you don’t get to use EF async improvements. Sorry. You lost that feature as soon as you decided to be “persistence agnostic” with your repository abstraction.

I don’t think you can truly have your cake and eat it too. Either you get to use all of the nifty features your chosen ORM provides (in this case, true asynchronous enumeration of entity queries) or you aim for the lowest common denominator and forfeit it. Not every persistence method will have such a feature. Even if several do, it is likely they have very real and meaningful differences in implementation.

If only there were another way

1 comment:

  1. This is serendipitous. I've been having the same thoughts myself. Why are we creating a unit of work with a bunch of repositories when the dbcontext itself IS a unit of work which has a bunch of repository-like constructs (our custom dbsets)? I can understand the code-organisation point, but why do we confine the DbContext to the DbSet of each repository. Should we just use the strongly-typed DbContext?

    Anyway, back to my app with abstractions-aplenty which will apparently make it more easily maintainable.