This morning DHH twittered about adding “batch find” to ActiveRecord in 2.3. Mike Gunderloy has a good explanation of how batch finding can be used. Definitely a nice improvement, but there is perhaps a better alternative already out there.
Stepping back, one of the attractive aspects to Rails is the simple abstraction it wraps around the database. It works beautifully until the abstraction leaks and you’re left wondering why your code is running slow and Mongrel is chewing up memory. The problem is the same query that used to find 500 rows now find 50,000 and pulling in that many records will fill the available memory and slow everything down. So you hack up something with limits & offsets or borrow Jamis Buck’s trick for fast MySQL “cursors” to get things working again. (And hopefully think about whether it might be better moved to an offline queue.)
Another solution is the handy pseudo cursors plugin. It has been around a while but has never appeared to get much attention. It adds a single find_each method that behaves like the usual find except of returning an array it takes a block that yields for each record. An example:
User.find_each(:conditions => ["created_at > ?", 3.weeks.ago]) do |user| user.send_welcome_message! end
Under the hood it executes the query like normal, except that it only fetches the ids of the records. Even for a very large dataset having just the ids in memory is acceptable. Then in batches it requests the full state of the records and yields them. One advantage over the new “batch” methods is that pseudo cursors supports normal :order clause. And it can be easily chained with other scopes like BlogPost.recent.find_each(:conditions => “views > 100”, :order => “created_at DESC”, :include => [:comments])
A while back I pulled this plugin (which is stable but not actively developed) to Github for easier hacking. I made one enhancement to have it honor the :include option for eager loading associations.