NoSQL First Impressions: Object Databases Missed the Boat

I’ve spent the past few weeks here at work researching and playing with NoSQL databases (and especially MongoDB) for a new feature we’re developing that doesn’t easily fit into a relational model. And so far, I really like what I see. The profoundness of the shift away from the relational model and the implications that has just blow my mind. You no longer have to fragment your data to persist it. You just store it. That’s it. No more hours toiling over the design your table schema and how to break apart the data you put together just to fit it into a relational model. You can now store your data exactly how you use it in your application, with no other special needs or impedance mismatches. Going back to an RDBMS system now just seems illogical – it’s like breaking apart a camera tripod just to fit it in the same standard size case you’ve been using for years instead of just collapsing it and finding a different case that fits it better. All that effort you go though tearing down your tripod and putting it back together every time you use it is wasted and unnecessary. It’s a symptom of the larger problem that your case doesn’t fit your tripod.

Once you fully get the NoSQL viewpoint and the implications it has on the future of data storage, you have one of those “why didn’t anyone think of this sooner?” moments. And yes, I know key/value stores like memcached existed long before this whole new “NoSQL” movement, but it’s not quite the same thing.

Object Databases

Object databases are an interesting case. They are in the “NoSQL” category and solve the same problem other NoSQL databases do – creating a storage mechanism that fits your data instead of making your data fit a predefined rigid storage mechanism. So why didn’t they catch on? What gives? It’s anyone’s guess why object databases didn’t catch on and achieve widespread adoption, but one thing is for sure – object databases really missed the boat while other NoSQL database technologies are running rings around them.

The primary problem is that most object databases are designed for a specific language, like Java or .NET. Some have more language support than others, but the problem is still the same – by integrating in directly with a language’s object model they depend on specific language implementations, which severely limits adoption. They solve the impedance mismatch problem that RDBMSes present, but fail to provide language-agnostic data persistence and access, effectively tying their use to specific technology stacks and platforms.

The Best of Both Worlds

NoSQL databases are gaining traction like no other database technology has in over 30 years. The reason is that good NoSQL databases solve both key data storage problems: impedance mismatches AND language-agnosticism. By storing the data in an intermediary format like JSON (or BSON) and allowing custom file content types, the focus is on the actual data itself instead of specific technologies or data models. The technology stack becomes irrelevant, and the walls to gaining user adoption crumble.

The bottom line is that NoSQL databases are not just the latest fad. They represent a fundamental paradigm shift in data storage and are an entirely new way of thinking about how to store and access your data in a way that makes sense for your actual usage. NoSQL databases are absolutely something you should be paying attention to and following closely – they are a strong contender to the RDBMS model and will likely become the de-facto data storage choice for most new web applications within a decade.

Hire Me

Have a problem you need help solving?
I'm available for freelance work. (and I love solving hard problems)

Trackbacks

Comments

  1. Nice article, I envy that you can study this new stuff in your job, for most boss/managers etc it’d like a waste of time until it completely finished.

    It’d be great if you have time to put more articles and references to NoSQL in your blog

    I have one question in mind, if this new technology is emerging up, wouldn’t it be better to dedicate more time learning it and less developing ORM tools?

  2. That is because his “boss” is a great boss! ;-)

    Great post Vance.

  3. Great post – thanks for sharing it. Definitely agree that being language agnostic is one reason why this generation of technologies is seeing more widespread adoption then OODBs.

    I work on the MongoDB project so if you have any questions as you keep exploring, feel free to ping me.

  4. that’s all nice and well but normalization and transactions across more than one document *are* important and until nosql databases provide proper support for this they are nothing more than a toy or something that can only be applied to a small number of corner cases

  5. You make it sound like relational databases fragment data just for kicks. Not all persistence requirements fit well into a relational model, but a lot do. One reason you ‘fragment’ data in the first place is to have one authoritative version of the data. The typical relational strategy of denormalization leaves you with multiple copies of the data which some poor dba will have to reconcile during some data migration or upgrade. If you’re just storing documents or simple key-value pairs, that’s not a big deal. But instead of just gushing about how great nosql is, how about some nosql proponent go through the first view forms of normalization with a basic set of data, say involving companies, users, and orders, and show how much better nosql will be, not just for a web app, but for the whole spectrum of uses that data is put to?

  6. I want to love MongoDB (even if the name brings up bad memories of a job before the DB existed), but I’m not finding, a really great tutorials (which I’ll be working on one) or hosting solution that provides it.

    For object DB’s, isn’t Intersystems’ Cache supposed to be fairly good? (I’m not promoting them.. Never really used their product.)

  7. So well put Vance. I agree, getting away from the idea of design schemas and fitting your objects to them is so antiquated. As software developers, what we really want is to persist our objects, or state, more generally. I have totally fallen in love with mongo for this reason alone. Great post.

  8. All of the following just IMHO, of course, just like your blog post ;)

    Neither are “NoSQL” databases a general-purpose replacement for relational databases (the tone of your post seems to imply that, sorry if I got that wrong), nor have object databases “missed the boat”.

    They can’t be a general-purpose “replacement” for relational databases because they don’t do the same thing, and they don’t want to, thats what its about. They want to choose different priorities than relational databases that have their strenghts and weaknesses elsewhere.

    It’s also interesting how you:

    1) seem to imply that using a relational database means an impedance mismatch per-se

    and

    2) that you don’t see an impedance mismatch between all the NoSQL databases (except object databases) and OO code. An object database is the only database that lets you store objects without any mismatch.

    When you use relational data as relational data in your application, there is no impedance mismatch.

    There are surely many reasons why object databases did not get as widespread acceptance as relational databases but discussing them here would go a bit too far. Object databases are still strong in their particular fields, mostly embedded systems. There is nothing easier than a small footprint object database on an embedded system to persist your objects when you’re programming in an OO language on such a device.

    If I may link to a nice post by a core couchdb contributor about what NoSQL is about: http://blog.couch.io/post/511008668/nosql-is-about

    You may just be overly enthusiastic over your recent findings but the new databases just mean more alternatives and thats a good thing. More alternatives you can choose from when you’re looking for the storage that best fits your usecase and data requirements of a particular application or even just a part of a particular application (polyglot persistence anyone?). Each of these databases have their own set of (different) characteristics that make them suitable for some tasks and not for others.

    All this impedance mismatch talk is nice but at the end of the day its nothing more than transforming/moving data between different representations, sometimes its more difficult, sometimes less. Relational to OO is surely not easy. Documents to OO may be easier, it looks like it, but no mismatch at all? I dont think so. Transforming data between different representations is something we do all the day in all kinds of different ways in IT. If you can avoid it sometimes or make it easier, good.

    Using an RDBMS is not “going back”. You should just make sure that it is the right tool for the job, as always. If you’re not aware yet of the *strenghts* of relational databases you should spent some time to learn them. Otherwise you may find yourself using some NoSQL database where a relational database would be the much better fit (yes, there are many such cases). Not everything is better with the new shiny databases ;)

    Just relax and enjoy the new choices. Don’t exchange one hammer for the other. Put them all into your toolbox.

  9. Nice topic. I view this subject as someone who lives in the object database space. I work for Versant ( the guys behind the Versant object database and db4o ).

    It’s been said, “necessity is the mother of all invention”. I see this NoSQL “movement” as innovation driven by necessity.

    The necessity stems from the fact that in the last decade, data has grown and continues to grow out of bounds, the internet has brought unprecedented concurrency, business value has driven software model complexity.

    RDB’s are runtime relationship execution engines. Even though their called “relational”, they don’t actually store relations. They store descrete data and then at runtime the descrete data is “related” by performing set based operations. That was all well and good when you were dealing with 10′s of thousands of records and 20 tables. However today lots of systems are dealing with millions of objects and 100′s of classes with inheritance, recursive relationships, containment, polymorphism, etc. It’s not hard to imagine why the JOIN operation is having issues keeping up with near real-time requests in the face of these realities.

    So, invention or ( existing alternate technologies – i.e. ODB’s ) which help to deal with this new reality are what is represented by the spirit of NoSQL. In the end, it just comes down to using the right tool for the job. RDB’s are still fine for the majority of applications, but when you are trying to build LinkedIn, Network Management, Protein-Protein analysis, Fuel flight path optimization, etc etc … these things need a better tool.

    In some cases a MongoDB is the right tool and in other the Versant object database is the right tool and in other cases something else.

    As software engineers we all need to become familiar with our choices and use the right tool for the job.

    -Robert

  10. I would also point out that in most systems dealing with remote references efficiently across networks and then updating those together as a unit of work is pretty important.

    I think it is a real problem that MongoDB does not have transactions which span linked objects. I also think it’s not good that there are not ways to optimize the load of multiple linked objects across the network … which is generally a performance killer. For those developing it, these are some of the things I would suggest taking a look at in future implementations. Lessons learned nearly a decade ago in the object database space.

    -Robert

  11. OODBMS missed the boat for very simple reason. It is not their boat. It’s a completely different boat that some moron wrongly labeled “NoSQL”.

    The right name on that boat should have been “Horizontally Scalable Data Storages”. And as the name suggests, the main passengers on that boat are cassandra, hadoop, bigtable, riak, voldemort.

    Interestingly enough MongoDB team initially also missed the point of modern “NoSQL” phenomenon. But they quickly learned, and now automatic sharding is their top priority for next version.

    As for impedance mismatch, the real NoSQL camp (oodb databases) were trying to sell it for decades. They failed. But this is a completely different story.

  12. I’ve been working with mongo for a couple months now and am still very “meh” on the whole NoSQL idea. I’m doing maintenance mode on a system that was built on mongo and find debugging date-related issues on mongo much more difficult than MySQL or postgres – lack of development tools and lack of sophisticated querying functions are making things more tedious than I’m used to. The only “killer feature” I’ve encountered seems to be performance?

  13. As a clarification to some comments here – I don’t believe that NoSQL databases are just a drop-in replacement for any RDBMS, nor do I believe that they should be used in every circumstance, every time.

    The main gist of this post was simply that NoSQL databases often provide a way to persist your data that is closer to the way you actually use it and pass it around in your application, and eliminates a lot of setup work normalizing tables and data structures that probably isn’t that important for most web projects.

    NoSQL is simply a tool, and you should always choose your tools based on your work requirements. It just happens that this tool is rapidly increasing in relevance, and I believe will be a very significant player in the near future because of all the benefits it provides and flexibility it has.

  14. Hi Vance: Thanks for your perspective on this topic. After hearing about how many web companies at the forefront are using NoSQL to solve difficult problems, I’ve started to seriously evaluate Mongo as an alternative to MySQL. We have a number of applications that need to store data that doesn’t necessarily fit well in an RDMS. Hearing the term ‘impedance mismatch’ really hit home.

    One question I do have: Using MySQL has been comfortable for a while, as we know that when we retrieve data from there, it will have a structure, and that structure won’t change often, becuase it can’t. When working with schema-less technologies, what do dev teams need to do differently or be more aware of?

  15. I couldn’t agree anymore people should move away from design schemas and fitting their objects them is way way old-fashion!

    Great read, Im off to read another!

    -Dwayne

All content copyright © 2013 Vance Lucas | Powered by WordPress | Entries (RSS) | Comments (RSS)