dingus

/ˈdɪŋəs/ Something whose name is unknown or forgotten

Active Record find_each using Postges cursors

March 2022

Active Record’s find_each is the go-to method for loading a large number of records in an efficient way. Under the hood it uses OFFSET and LIMIT to load records in batches and yield them to our block. One drawback to this approach is Rails requires records to be ordered by their primary key, despite Postges only requiring the ORDER to be unique1.

Not ideal!

With Postgres it’s possible to solve this problem using another tool called a cursor. These can wrap any query, ordered any way, and make it possible to fetch it’s results incrementally2. But unfortuantly Rails doesn’t have native support for Postgres cursors.

Spelunking Rails’s issues on Github your author stumbled across rails/rails#28085, which contains a monkey patch impementing a variant of find_each using cursors.

module ActiveRecord
  module Batches
    # Implements `find_each` variant using cursors. Source:
    # https://github.com/rails/rails/issues/28085#issuecomment-457909168
    #
    # Our changes:
    # * `break` condition avoids extra iteration on small sets
    # * use Active Records safe string replacement in `find_by_sql`
    # * simplify cursor definition based on Postgres defaults (more inline)
    # * remove redundant references `self`
    def batched_each(count: 1000, &block)
      transaction do
        # Cursors are created `WITHOUT HOLD` by default and cannot be used
        # outside of the transaction that created them. `NO SCROLL` specifies
        # the cursor cannot be used to retrieve rows in a non-sequential
        # fashion.
        connection.execute("DECLARE pc NO SCROLL CURSOR FOR #{to_sql}")

        loop do
          result = find_by_sql(["FETCH FORWARD ? FROM pc", count], &block)
          break if result.count < count
        end
      end
    end
  end
end