Most developers are coming from a background with relational database-specific experience, and then trying out some new NoSQL databases like MongoDB. Here are some “gotchas” I ran into while using MongoDB with my MySQL hat still on.
Queries are case-sensitive
Fields and queries in MongoDB are case-sensitive:
1 2 3 | var test1 = db.test.find({'tags': 'jquery'}).count(); var test2 = db.test.find({'tags': 'jQuery'}).count(); test1 == test2; // Output is false - they do not query for the same information |
This can cause some headaches if you don’t normalize user input ahead of time. Imagine you already have several posts tagged with ‘NoSQL’ and users enter ‘nosql’ in a tag search box. If you don’t normalize how the data is stored internally (like lowercase all tags), your user will see a much smaller set of posts than they are expecting to see. This is something you don’t have to worry or even think about with MySQL and most other relational databases.
If you can’t normalize the stored data, but still want a case-insensitive query, then you must perform a slower regular expression query:
1 | db.test.find({'tags': /jquery/i}); // Note the 'i' flag for case-insensitive |
Data is type-sensitive
Data stored within MongoDB knows it’s type. There is a small but significant difference between these two records:
1 2 | {'count': 102}; // 'count' is stored as an int {'count': "102"}; // 'count' is stored as a string |
This is obvious to any programmer when presented like this, but what may not be obvious is that this also affects how you can query for these records:
1 2 | // This returns 1 instead of 2, because it only matches the integer value db.test.find({'count': 102}).count(); |
This is due to how MongoDB stores documents internally with BSON (Binary JSON), using various MongoDB data types. This means – like the point above – that you must pay attention to how you are saving data into MongoDB, because it will affect how you can query for it later.
Documents sizes are capped at 4MB each
This isn’t a big issue for most people, but it’s something to be aware of if you plan on storing large chunks of text or nesting a bunch of objects inside a single document. Nesting comments inside article documents is a particular approach that may give you pause knowing there is an upper threshold on document size.
Note that this limitation is not an issue for storing files in MongoDB because GridFS transparently divides the contents of files larger than 4MB across multiple documents.
Only one index is used per query
Simply adding more indexes doesn’t always help queries run faster. MongoDB can’t use multiple indexes together like MySQL and other RDBMS can (see “Index Merge Optimization“). This means that if a query is selecting or sorting based on multiple fields, a compound index should be created with those fields for the query to run most efficiently.
MongoDB is for high-memory and 64-bit systems only
Although MongoDB can be downloaded, compiled, and installed on 32-bit systems, it should never be run in production on them. This is because MongoDB stores all indexes in memory as well as memory-mapped files for all disk I/O for increased speed and throughput. While these two facts aren’t “gotchas” themselves – they are clearly explained on the website – it does mean that MongoDB can quickly use a lot of memory, especially as the size of your database grows and becomes significant. Since 32-bit systems have an effective 3GB memory limit of addressable memory space, they prevent you from being able to add more memory as the size of your database grows. This is also something to think about if you plan to run MongoDB on a small VPS with limited memory, even if it is 64-bit. You may be forced into a memory upgrade sooner than you are prepared for if your database is large or rapidly growing.
Subtle Differences
When starting out with MongoDB, it’s easy to draw a lot of parallels to MySQL and make a lot of assumptions based on those similarities. Collections are kind of like tables, fields are kind of like columns, a document is kind of like a row. You may find yourself going on to make more assumptions about how it’s storing data, how you can query it, what you can do with it, etc. without even knowing it. So while MongoDB has a lot of similar features and is one of the easiest NoSQL databases to transition to from a relational database, it has some very distinct differences “under the hood” that you have to be aware of and plan for ahead of time. The MongoDB documentation is well-written, and is pretty good at explaining any potential differences or “gotchas”, so make sure to read it thoroughly before making the jump – and watch that first step.




Thanks for the writeup – some good tips in there. Please keep us posted if you see any places where the documentation could use improving, or anything like that.
Actually, with the new $or operator the query optimizer will do some index merging to use multiple indexes to satisfy an $or. Check out this case:
http://jira.mongodb.org/browse/SERVER-109
Nice writeup, but I dont believe that “Data is type-sensitive” is a real “gotcha”.
This fact is true of any relational database as well, where you have a “defined schema” as opposed to no schema in MongoDB. The only difference is that you’ll most likely run into insertion exceptions if your data doesnt match the type defined in the schema.
The shift here is that for things like MongoDB, your schema is effectively defined in your code rather than in your database.
Your application should know how to store and retrieve data from the database consistently, and if done correctly, will never run into the problem you have described of having keys with different value types.
Those are definitely things to be careful with, Vance. Thanks for the writeup. And thanks Mike for adding your $0.02, you guys at 10gen rock! Another thing to consider that I run into quite a bit is the fact that because of the heavy lifting done behinds the scene you really don’t want to be sharing a mongod instance with something else heavy like MySQL.
Often times I’ve seen people install it alongside MySQL or have both a master and slave mongod instance running on the same machine/VM/slice/EC2 instance. You’re going to run into resource bottlenecks eventually with that kind of setup.
@Bryan
While it’s true that you select a column type when defining a table schema in a relational database like MySQL, how the values are stored internally does not affect how you can query for that data. With relational SQL databases, all query parameters are effectively strings, even with prepared statements. There is no difference between these two queries in MySQL:
They both return the exact same dataset in MySQL. That’s the “gotcha” here. Data types in MongoDB affect how you can build your queries, whereas with relational SQL databases, they only matter if you want to use query functions like aggregates, date formatting, etc.
The two bits:
- you can use a string in place of an integer in a query, and
- case insensitive queries are the default
…in MySQL are complete mis-features. How can one claim that is sane? I would never be caught dead doing something silly like that. Who would actually expect ‘jQuery’ and ‘jquery’ to be the same thing in the database?!?
Another point, since it’s schema-less, you must be careful about typo’s.
One additional thing to consider is that you really need to be economical when naming the fields. MongoDb stores the field names along with the values with each document. Which can be substantial when you have a non-trivial amount of Documents.
@Vance Lucas
That applies only to MySQL, in Postgres and other real databases you would have to use CAST(id AS TEXT)
@Vance Lucas
You see, to me, it feels like the “gotcha” is in MySql, not MongoDB.
Being able to query by a number as well as by the numbers string representation feels like a strange thing.
In the rigid and strict world of traditional RDBMS, you wouldnt expect the engine to just take either data type (or is it just me??).
As @kk points out, this is specifically a MySql “feature”
It might help to speed this regex up if you add an ‘o’ at the end like such:
db.test.find({‘tags’: /jquery/io});
which is a flag to compile this regex once.
Stuff like this works in Perl and I think in Ruby’s regex lib…
MongoDB’s regex lib is javascript right? Really not sure if that works or not in js. or maybe compile it ahead of time using RegExp::compile