One of the downsides of using an ORM is that it abstracts the queries in a way that it's difficult to understand which queries are being performed; it's harder to notice inefficient queries, since we didn't actually write them!
A performance issue that is very common when using ORMs is the N+1 query problem. This anti-pattern usually occurs when trying to load related information for each item in a list of results.
Since most ORMs don't perform eager loading by default (to avoid fetching more data than necessary), it's necessary to make extra queries per item to fetch the related data. Imagine that we have a simple music store app:
class Band
include Mongoid::Document
has_many :albums
end
class Album
include Mongoid::Document
belongs_to :band
has_many :songs
end
class Song
include Mongoid::Document
belongs_to :album
end
A classic appearance of the N+1 problem would be:
albums = Album.where(year: 1970)
albums.each do |album|
puts "Album: #{album.name}, Band: #{album.band.name}"
end
In this example, we are loading a list of albums (the first query) and then loading the related band for each album, making one query per album. With a list of N albums, we make 1 query to get the albums and N queries to get the bands: a total of N+1 queries.
The problem can also appear in other scenarios, such as more complex object graph traversals. In those cases, it can be much harder to spot the cause.
To understand why this is inefficient, we need to consider the impact of latency in the response time. Even if each of the N queries is executed quickly, each query requires one database round trip. This latency will add up linearly as N increases, which can have a devastating effect in the response time.
Avoiding the N+1 problem
A solution to the N+1 problem is to eagerly load the documents that we need, so that when we access a relation it's already preloaded and doesn't trigger a query.
In Mongoid, we can do this by running an $in
query using the relation foreign keys to fetch all the related records, and then assign the relations in memory using set_relation
. This way, we only perform a single query to fetch a relation, regardless of the amount of documents returned by the first query.
band_ids = albums.map(&:band_id)
bands = Band.in(id: band_ids)
bands = bands.index_by(&:id)
albums.each do |album|
album.set_relation(:band, bands[album.band_id])
end
The algorithm changes slightly depending on the type of relation we want to include, and how the foreign keys are stored, but the idea is the same: get a list of foreign keys, use them to make a query to fetch the related documents, and assign the relations in memory using the foreign keys to match the objects.
album_ids = albums.map(&:id)
songs = Song.in(album_id: album_ids)
album_songs = songs.group_by(&:album_id)
albums.each do |album|
album.set_relation(:songs, album_songs[album.id])
end
Although eager loading the documents manually is not very complex, writing this logic every time is cumbersome, error-prone, and hard to maintain.
Fortunately, Mongoid has baked-in support to eager load relations using the includes
method, which allows to specify all the relations that we want to eager load once the query is made. The following is equivalent to the two snippets above:
albums.includes(:band, :songs)
Easy, right? This becomes extremely useful when trying to avoid those sneaky N+1 queries we were talking about 😉
Performance Considerations
Although eager loading can be helpful, it's important to be aware that:
- It takes a lot of processing to obtain the foreign keys, fetch the documents from the database, traverse them, and assign them to the objects in memory.
$in
queries are usually slower, and get slower as the amount of values increases.
Most of the times, the overhead of processing in memory and making a more complex query is lower than the latency of issuing a lot of queries separately. Rewriting the code to eager load the relations that we need will usually improve the performance.
As with any performance optimization, there might be some corner cases where eager loading is slower. It's important to run benchmarks and measure the response time to verify that it's worth it to use eager loading 📊
Always keep an eye out for this anti-pattern; accessing the database in a naive way will hurt the performance. Using a tool like bullet
can help to detect N+1 queries or unused includes, but it's better to use it as a safety net for the cases that slipped past our manual control.
Have in mind that in mongodb it's possible to embed the related documents instead of storing them in separate collections. Depending on the domain requirements, it can be a very good way to get the best out of the database, and avoid the problem entirely.
Thinking about data access from the beginning yields the best results, because it allow us to spot potential inefficiencies, and find alternative queries that perform better.
Limitations in Mongoid
Eager loading in Mongoid has some limitations:
- Criteria-only: It's only possible to use eager loading with a
Mongoid::Criteria
object. We can't leverage the functionality if we have a list of objects. - No Nested: Only direct relations can be included, nested relations can't be eagerly loaded (like
band.albums.songs
). - No Polymorphic: Polymorphic relations can't be included.
The first limitation exists because Mongoid relies on the metadata to pick the appropriate eager loading algorithm. The relation metadata allows to infer things like foreign key names, the name of the database collection, and the name of the setter method for the relation.
I have solved the other limitations in mongoid_includes
💎, which extends the includes
method to support eager loading polymorphic and nested associations.
You can check the next post where I talk about this gem and explain the motivation behind it, as well as the difficulties of extending Mongoid 😃