Say you have a Rails app where you have 2 tables and a join table between them. If you want to insert records into the join table, generally you’ll create tuples of table_a_id and table_b_id, but for large inserts this can be inefficient, especially when table_a might only have a few records in the tuple and table_b has many. Luckily with the help of ARel, we can insert records from a subquery (ActiveRecord Relation).

Setup

In this example, we have 2 main models Part and Widget. Our factory has a bunch of parts to make widgets with, but we’re very good at using what’s available, so we design our widgets that they can be made of many different parts that are similar in specs. For that reason we have a WidgetPart table that lets us know which parts can be included in a widget.

Here’s the migrations and models:

# create_parts.rb
class CreateParts < ActiveRecord::Migration[7.0]
  def change
    create_table :parts do |t|
      t.integer :part_type
      t.float :height
      t.float :width
      t.float :depth
      t.float :weight
      t.string :material
      t.timestamps
    end
  end
end
# create_widgets.rb
class CreateWidgets < ActiveRecord::Migration[7.0]
  def change
    create_table :widgets do |t|
      t.string :name
      t.timestamps
    end
  end
end
# create_widget_parts.rb
class CreateWidgetParts < ActiveRecord::Migration[7.0]
  def change
    create_table :widget_parts do |t|
      t.references :widget
      t.references :part
      t.timestamps
    end

    add_index :widget_parts, %i[part_id widget_id], unique: true
  end
end
class Part < ApplicationRecord
  has_many :widget_parts
  has_many :widgets, through: :widget_parts

  PART_TYPES = {
    nut: 0,
    bolt: 1,
    gear: 2,
    spring: 3,
    belt: 4,
    pulley: 5
  }.freeze

  enum part_type: PART_TYPES
end
class Widget < ApplicationRecord
  has_many :widget_parts
  has_many :parts, through: :widget_parts
end
class WidgetPart < ApplicationRecord
  belongs_to :widget
  belongs_to :part
end

Insert All Implementation

A simple way to insert parts into a new widget would be something like this which utilizes the #insert_all method.

pump = Widget.create(name: "booster pump")
pump_parts = Part.where(part_type: [:nut, :bolt], material: ["stainless steel", "aluminum"], width: 0..5, height: 0..10, depth: 0..2, weight: 0..30)
pump_parts = pump_parts.or(Part.where(part_type: [:gear], material: ["stainless steel", "aluminum"], weight: 0..200))
pump_parts = pump_parts.or(Part.where(part_type: [:belt], weight: 0..500))

tuples = pump_parts.map do |pump_part|
  {part_id: pump_part.id, widget_id: pump.id}
end
WidgetPart.insert_all(tuples)

This is a fine approach, but can be inefficient as the list of available parts grows. Now we’ll look at how to handle this as a subquery and avoid loading all of the pump_parts into the application.

ARel Implementation

ARel includes an InsertManager class that handles this exact case we want and can help speed up this bulk insertion from a subquery. Our goal is to generate sql like this:

INSERT INTO widget_parts (part_id, widget_id, created_at, updated_at)
SELECT parts.id as part_id, 1 as widget_id, NOW() as created_at, NOW() as updated_at
FROM parts
WHERE ("parts"."material" IN ('stainless steel', 'aluminum') AND ("parts"."part_type" IN (0, 1) AND "parts"."width" BETWEEN 0.0 AND 5.0 AND "parts"."height" BETWEEN 0.0 AND 10.0 AND "parts"."depth" BETWEEN 0.0 AND 2.0 AND "parts"."weight" BETWEEN 0.0 AND 30.0 OR "parts"."part_type" = 2 AND "parts"."weight" BETWEEN 0.0 AND 200.0) OR "parts"."part_type" = 4 AND "parts"."weight" BETWEEN 0.0 AND 500.0)

Where the WHERE condition is what we use to build our pump_parts relation above. Here’s how we can leverage ARel to build this query.

# pump_parts and widget code above

wp_tbl = WidgetPart.arel_table
im = Arel::Nodes::InsertStatement.new
im.relation = wp_tbl
im.columns = [wp_tbl[:part_id], wp_tbl[:widget_id], wp_tbl[:created_at], wp_tbl[:updated_at]]

# select just the fields we need
select_stmt = 'parts.id as part_id, ? as widget_id, NOW() as created_at, NOW() as updated_at'
select_stmt = ActiveRecord::Base.sanitize_sql_array([select_stmt, pump.id])
im.select = Arel.sql(pump_parts.select(select_stmt).to_sql)

ActiveRecord::Base.connection.execute(im.to_sql)

Pretty simple! As you can see im.relation defines which table we’re inserting into, im.columns defines which columns we’ll insert into and im.select deifnes the subquery we’re using.

Using this technique, it’s easy to create data in a join table from an ActiveRecord::Relation and avoid loading unnecessary data into the application.

Benchmarks

To demonstrate the increased efficiency of using subqueries for insertions instead of pulling data into the app, I added 40,000 eligible parts for our pump and compared insert speeds using both methods.

                user     system      total        real
Insert All:    0.857074   0.008193   0.865267   (1.954667)
ARel:          0.003582   0.000000   0.003582   (0.319263)

Even at this relatively small scale you can see the performance improvement.