MongoDB Gotchas

Most developers are coming from a background with relational database-specific experience, and then trying out some new NoSQL databases like MongoDB. Here are some “gotchas” I ran into while using MongoDB with my MySQL hat still on:

Queries are case-sensitive

Fields and queries in MongoDB are case-sensitive:

1
2
3
var test1 = db.test.find({'tags': 'jquery'}).count();
var test2 = db.test.find({'tags': 'jQuery'}).count();
test1 == test2; // Output is false - they do not query for the same information

This can cause some headaches if you don’t normalize user input ahead of time. Imagine you already have several posts tagged with ‘NoSQL’ and users enter ‘nosql’ in a tag search box. If you don’t normalize how the data is stored internally (like lowercase all tags), your user will see a much smaller set of posts than they are expecting to see. This is something you don’t have to worry or even think about with MySQL and most other relational databases.

If you can’t normalize the stored data, but still want a case-insensitive query, then you must perform a slower regular expression query:

1
db.test.find({'tags': /jquery/i}); // Note the 'i' flag for case-insensitive

Data is type-sensitive

Data stored within MongoDB knows it’s type. There is a small but significant difference between these two records:

1
2
{'count': 102}; // 'count' is stored as an int
{'count': "102"}; // 'count' is stored as a string

This is obvious to any programmer when presented like this, but what may not be obvious is that this also affects how you can query for these records:

1
2
// This returns 1 instead of 2, because it only matches the integer value
db.test.find({'count': 102}).count();

This is due to how MongoDB stores documents internally with BSON (Binary JSON), using various MongoDB data types. This means – like the point above – that you must pay attention to how you are saving data into MongoDB, because it will affect how you can query for it later.

Documents sizes are capped at 4MB each

This isn’t a big issue for most people, but it’s something to be aware of if you plan on storing large chunks of text or nesting a bunch of objects inside a single document. Nesting comments inside article documents is a particular approach that may give you pause knowing there is an upper threshold on document size.

Note that this limitation is not an issue for storing files in MongoDB because GridFS transparently divides the contents of files larger than 4MB across multiple documents.

Only one index is used per query

Simply adding more indexes doesn’t always help queries run faster. MongoDB can’t use multiple indexes together like MySQL and other RDBMS can (see “Index Merge Optimization“). This means that if a query is selecting or sorting based on multiple fields, a compound index should be created with those fields for the query to run most efficiently.

MongoDB is for high-memory and 64-bit systems only

Although MongoDB can be downloaded, compiled, and installed on 32-bit systems, it should never be run in production on them. This is because MongoDB stores all indexes in memory as well as memory-mapped files for all disk I/O for increased speed and throughput. While these two facts aren’t “gotchas” themselves – they are clearly explained on the website – it does mean that MongoDB can quickly use a lot of memory, especially as the size of your database grows and becomes significant. Since 32-bit systems have an effective 3GB memory limit of addressable memory space, they prevent you from being able to add more memory as the size of your database grows. This is also something to think about if you plan to run MongoDB on a small VPS with limited memory, even if it is 64-bit. You may be forced into a memory upgrade sooner than you are prepared for if your database is large or rapidly growing.

Subtle Differences

When starting out with MongoDB, it’s easy to draw a lot of parallels to MySQL and make a lot of assumptions based on those similarities. Collections are kind of like tables, fields are kind of like columns, a document is kind of like a row. You may find yourself going on to make more assumptions about how it’s storing data, how you can query it, what you can do with it, etc. without even knowing it. So while MongoDB has a lot of similar features and is one of the easiest NoSQL databases to transition to from a relational database, it has some very distinct differences “under the hood” that you have to be aware of and plan for ahead of time. The MongoDB documentation is well-written, and is pretty good at explaining any potential differences or “gotchas”, so make sure to read it thoroughly before making the jump – and watch that first step.

NoSQL First Impressions: Object Databases Missed the Boat

I’ve spent the past few weeks here at work researching and playing with NoSQL databases (and especially MongoDB) for a new feature we’re developing that doesn’t easily fit into a relational model. And so far, I really like what I see. The profoundness of the shift away from the relational model and the implications that has just blow my mind. You no longer have to fragment your data to persist it. You just store it. That’s it. No more hours toiling over the design your table schema and how to break apart the data you put together just to fit it into a relational model. You can now store your data exactly how you use it in your application, with no other special needs or impedance mismatches. Going back to an RDBMS system now just seems illogical – it’s like breaking apart a camera tripod just to fit it in the same standard size case you’ve been using for years instead of just collapsing it and finding a different case that fits it better. All that effort you go though tearing down your tripod and putting it back together every time you use it is wasted and unnecessary. It’s a symptom of the larger problem that your case doesn’t fit your tripod. Read More »

MySQL Series: How to Detect UTF-8 and Multi-byte Characters

Multi-byte characters can cause quite a few headaches for the unsuspecting webmaster. Sometimes all you need to do to figure out how to fix the problem is detect which database records have UTF-8 data in them and which ones do not. If you’ve been scanning records manually, stop now. Here’s a quick query to cure your ales:

Return all the rows with multi-byte characters in table posts on field title:

1
2
3
SELECT *
FROM posts
WHERE LENGTH(title) <> CHAR_LENGTH(title)

This does pretty much what it looks like: returns all the rows in the posts table where the title’s character length does not match the title’s length. This works because the LENGTH function returns the number of bytes in the string, while the CHAR_LENGTH function returns the number of characters in the string. If the string contains multi-byte characters (individual characters that are made up of more than one byte) like international UTF-8 characters, the two functions will return different numbers and will not be equal, thus including that row in your results.

Get Only Public Class Properties for the Current Class in PHP

PHP provides two built-in functions to retrieve properties of a given class – get_object_vars and get_class_vars. Both these functions behave the same exact way, one taking an object as a variable and the other taking a string class name. The tricky thing about the two functions is that they behave differently depending on the call scope, returning all of the class variables available within the called scope. So if you call either function within the current class you need properties from, all properties are returned – public, protected, and private – because the current scope has access to them all. This makes seemingly simple things like returning all the public properties within the current class a bit of a pain if you want to keep the code inside the class itself.
Read More »

Why WordPress Should Not Have Won the Open Source CMS Award

Packt Publishing announced the winners for their annual Open Source CMS Award in November, and since then I have been a bit disturbed that the 2009 winner was WordPress. My first reaction was this:

“… So a blogging platform won the content management system award? How sad is that?”

My knee-jerk “how sad is that?” reaction comes not because I don’t think WordPress is worthy, but because of what it implies about the state of other open source CMS projects. The reaction comes from the fact that a blogging platform is kicking your CMS’s ass in its own category.

Read More »

MySQL Series: Return NULL Values First With Descending Order

Sometimes there are unique situations where you need to order query results by a particular field in descending order, but also need NULL values first. The default (and logical) behavior of MySQL in this case is to return NULL values last, because in descending order they have the lowest value (none). But what if you really need to reverse this and force NULL values to the top of the result set?
Read More »

CodeWorks 2009 Dallas

CodeWorks 2009 Speaker
I was fortunate enough to be selected as the regional speaker for the Dallas CodeWorks 2009 stop by the Dallas PHP User Group through a community voting and selection process. My talk was entitled Object Oriented Apologetics, and was essentially about letting people know what good object-oriented code is, when to use it, how to use it, and more specifically why to use it over traditional procedural PHP code. Read More »

The One Character Block Comment

When debugging, I often find that I have to comment and un-comment a block of code several times during the process of trying to find out what’s going on. That used to mean typing and deleting comment block characters repetitively, but not anymore. Here’s a simple solution to that problem: Comment or un-comment an entire code block of code by typing or deleting a single character.

I was able to arrive at this solution by combining the one-line comment with the comment block in a way that takes advantage of the rules the different types of comments have to follow.
Read More »

OKC PHP User Group Reboot

The local Oklahoma City PHP User Group is re-starting with the okcCoCo as the new venue. The new meetings will be on the second Tuesday of each month, starting with Tuesday, June 09, 2009 at 6:30pm as the first official meeting. Visit the official OKC PHP User Group website to register for meeting reminders and to connect with other local PHP developers.

I will be presenting my talk on Building a Data Mapper with PHP5 and the Standard PHP Library, followed by a discussion on ORMs and whatever else comes up. The presentation will cover all the thought processes, goals, theories, and actual code that goes into building an ORM (or really any other larger project that requires more advance planning). The project that was the basis of this presentation is phpDataMapper – an open-source PHP5 data mapper ORM layer that I started in the fall of 2008. It now powers the model layer of InvoiceMore, a live web application I launched in March 2009.

This is a presentation I have given before at Tulsa TechFest.

Watch the video of this presentation online

jQuery UI Datepicker with AJAX and LiveQuery

I’ve been a little aggravated lately trying to get jQuery UI Datepicker to work correctly on dynamically added fields for creating additional line items to invoices for InvoiceMore. It works great for fields already displayed on the page, but it tends to have major issues with dynamically added fields through AJAX or AHAH. Of course it won’t work out of the box with elements added dynamically to the DOM, so we can use jQuery’s $.live() event (new in 1.3 – you previously had to use liveQuery) to make it work. The Datepicker works by binding to the focus() event by default, but as of jQuery 1.3.2, the ‘focus’ event cannot be monitored by the ‘live’ event function. So we’re stuck with a little work around:

1
2
3
4
5
6
7
<script type="text/javascript">
$(function(){
	$('input.calendarSelectDate').live('click', function() {
		$(this).datepicker({showOn:'focus'}).focus();
	});
});
</script>

You would think just a simple “$(this).datepicker()” call wrapped inside the live() event would work, but it doesn’t. Turns out that in order to get it working consistently, you have to add the ’showOn: focus’ config option as well as manually focusing on the element with the focus() event. Charming.

All content copyright © 2010 Vance Lucas | Powered by WordPress | Entries (RSS) | Comments (RSS)