dingus

/ˈdɪŋəs/ Something whose name is unknown or forgotten

Rack Cheatsheet

July 2022

Rack interface

Middleware

Middleware is the atom of Rack Land. It’ has a very simple interface! It’s #initialize must accept a single argument, this is the next middleware in the stack. The current middleware should send #call to this middleware in it’s own #call method.

Middleware must also respond to #call, which must accept a single argument and must return the “magic triple”. The argument passed is the Rack “environment hash”, it contains information about the HTTP request and Rack-specific variables1. The magic triple is broken down below.

[
  200,                              # HTTP Status Code, must be an `Integer`
  {"Content-Type" => "text/plain"}, # Headers, must be a `Hash`, but can be empty
  ["Hello world."]                  # Body, must respond to `#each`
]

Applications

Applications are just a special case middleware. Because they’re at the bottom of the stack they do not need to implement #initalize, they must respond to #call.

Rack-up DSL

The entry point to Rack! Normally added to a config.ru file and booted with a sever with a Rack handler build in (like Puma).

Running an application

Using a proc:

run proc {
  [200, {"Content-Type" => "text/plain"}, ["Hello world."]]
}

Using a class:

class HelloWorld
  def call(_env)
    [200, {"Content-Type" => "text/plain"}, ["Hello world."]]
  end
end

run HelloWorld.new

Adding middleware

Without configuration

use Rack::ShowExceptions

With configuration

use Rack::Auth::Basic, "Rack Cheatsheet" do |username, password|
  OpenSSL.secure_compare('secret', password)
end

Two applications, one config.ru

APP_1 = proc {
  [301, {"Location" => "/hello-world"}, []]
}

APP_2 = proc {
  [200, {"Content-Type" => "text/plain"}, ["Hello world."]]
}

map "/hello-world" do
  run APP_2
end

run APP_1

Warming up applications on boot

warmup do |app|
  client = Rack::MockRequest.new(app)
  client.get('/warm-cache')
end

Request handing

Using the environment hash directly can be a little unwieldy, Rack::Request is a helpful wrapper to make it more ergonomic.

run lambda { |env|
  request = Rack::Request.new(env)

  unless request.scheme == 'https'
    return [301, {"Location" => "https://#{request.host}#{request.fullpath}"}, []]
  end

  if request.get?
    [200, {"Content-Type" => "text/plain"}, ["Secure hello world."]]
  else
    [405, {}, []]
  end
}

The response object

A convenience class for generating responses, calling #to_a or #finish will return a magic triple.

run lambda { |_env|
  response = Rack::Response.new

  response.write "<p>"

  case Time.now.hour
  when 0..11
    response.write "Good morning!"
  when 12..17
    response.write "Good afternoon!"
  when 18..23
    response.write "Good evening!"
  end

  response.write " <time>It's #{Time.now.strftime("%l:%M %P")}</time>.</p>"

  response.finish
}

Generating secure tokens from an array in Ruby

July 2022

Ruby’s SecureRandom provides a random number generator suitable for generating secure tokens. But it doesn’t allow the user to specify a source array, for example an array of characters or a wordlist.

By contrast Ruby’s [Array#sample] allows us to build a random sequence from any array. But it uses a psudo-random number generator and the sequences it generates are deterministic (guessable) and not suitable for generating secure tokens.

Luckily these two can work together. By specifiying SecureRandom as the source of randomness for [Array#sample] we can generate a secure token from any array.

Array("a".."z").sample(20, random: SecureRandom).each_slice(4).map(&:join).join("-")
# => "hcqo-dtnf-gsim-bawu-kvjy"
wordlist = %w[abandon ability ... zone zoo]
wordlist.sample(6, random: SecureRandom).join("-")
# # => "item recycle habit almost few beach"

[Array#shuffle] and [Array#shuffle!] also accept the random argument.

Compiling the Raspberry Pi Pico SDK on Apple Silicon

June 2022

Getting up-and-running building apps with the Raspberry Pi Pico SDK on MacOS isn’t straight forward if your machine runs on Apple silicon.Homebrew installs a broken version ARM GCC tool-chain (v11.2). This version throws seemingly arbitrary internal compiler error: Illegal instruction errors in builds.

Fortunately the previous version (v10.3) doesn’t suffer from the same issue. But Homebrew makes it tricky to past previous versions of cask formula, requiring users to dig out the revision and host a “tap” to install from. To make it easier to get up-and-running quickly I’ve pulled together the Homebrew tap, SDK, a “hello world” application and a README into a repo:

https://github.com/clowder/rpi-pico-macos-starter

Highlighting search results with Textacular

May 2022

It’s possible to leverage Postgres’s ts_headline1 to add highlighting of matched fragments in search results. With a caveat, the search fields need to be declared ahead of time.

To highlight a basic_search each column to be highlighted needs to be SELECT-ed using ts_headline with the query string parsed with plainto_tsquery.

class Post < ApplicationRecord
  def self.search(query)
    basic_search(title: query)
      .select("#{sanitize_sql_array(["ts_headline(title, plainto_tsquery(?))", query])} as title_highlighted")
  end
end

Using this blog’s post titles for testing we see:

[0] pry(main)> Post.search("server Rack").map(&:attributes)
  Post Load (3.1ms)  SELECT "posts".*, COALESCE(ts_rank(to_tsvector('english', "posts"."title"::text), plainto_tsquery('english', 'server\ Rack'::text)), 0) AS "rank35483819984107261", ts_headline(title, plainto_tsquery('server Rack')) as title_highlighted FROM "posts" WHERE (to_tsvector('english', "posts"."title"::text) @@ plainto_tsquery('english', 'server\ Rack'::text)) ORDER BY "rank35483819984107261" DESC
=> [{"id"=>4,
  "title"=>"Server-sent events with Rails and Rack hijack",
  "created_at"=>Mon, 30 May 2022 17:30:29.806047000 UTC +00:00,
  "updated_at"=>Mon, 30 May 2022 17:30:29.806047000 UTC +00:00,
  "rank35483819984107261"=>0.085297264,
  "title_highlighted"=>"<b>Server</b>-sent events with Rails and <b>Rack</b> hijack"}]

To work for an advanced_search the query string should be passed to to_tsquery and web_search to websearch_to_tsquery.

Cheap context switching with git-worktree

April 2022

Git supports multiple checkouts from a single repository, allowing for multiple working directories linked to a single repository.

For arguments sake let’s say you’re working in a feature branch and spot a typo. Damn! Instead of committing work-in-progress to switch branches git-worktree can checkout another branch to a temporary working directory.

$ git worktree add ../copy-changes main
$ cd ../copy-changes

Once complete you return to you’re main working directory and clean-up.

$ cd ../web-app
$ git worktree remove ../copy-changes

🧙🏻‍♂️

Active Record find_each using Postges cursors

March 2022

Active Record’s find_each is the go-to method for loading a large number of records in an efficient way. Under the hood it uses OFFSET and LIMIT to load records in batches and yield them to our block. One drawback to this approach is Rails requires records to be ordered by their primary key, despite Postges only requiring the ORDER to be unique1.

Not ideal!

With Postgres it’s possible to solve this problem using another tool called a cursor. These can wrap any query, ordered any way, and make it possible to fetch it’s results incrementally2. But unfortuantly Rails doesn’t have native support for Postgres cursors.

Spelunking Rails’s issues on Github your author stumbled across rails/rails#28085, which contains a monkey patch impementing a variant of find_each using cursors.

module ActiveRecord
  module Batches
    # Implements `find_each` variant using cursors. Source:
    # https://github.com/rails/rails/issues/28085#issuecomment-457909168
    #
    # Our changes:
    # * `break` condition avoids extra iteration on small sets
    # * use Active Records safe string replacement in `find_by_sql`
    # * simplify cursor definition based on Postgres defaults (more inline)
    # * remove redundant references `self`
    def batched_each(count: 1000, &block)
      transaction do
        # Cursors are created `WITHOUT HOLD` by default and cannot be used
        # outside of the transaction that created them. `NO SCROLL` specifies
        # the cursor cannot be used to retrieve rows in a non-sequential
        # fashion.
        connection.execute("DECLARE pc NO SCROLL CURSOR FOR #{to_sql}")

        loop do
          result = find_by_sql(["FETCH FORWARD ? FROM pc", count], &block)
          break if result.count < count
        end
      end
    end
  end
end

Recursive SQL

March 2022

This week I encountered SQL’s recursive query syntax1 for the first time, using the WITH RECURSIVE common table expression syntax.

Given an sample hierarchy of book genres we can build up a table of breadcumbs:

id name parent_id
1 Science Fiction NULL
2 Dystopian 1
3 Cyberpunk 2
4 Space opera 1
WITH RECURSIVE linages AS (
  -- The non-recursive base case, top-level parents only
  SELECT
    ARRAY[genres.name] AS genre_names,
    genres.id AS tail_id
  FROM genres
  WHERE genres.parent_id IS NULL

  UNION ALL

  -- Recursively join sub-genres to their parent
  SELECT
    linages.genre_names || genres.name AS genre_names,
    genres.id AS tail_id
  FROM genres
  INNER JOIN linages ON genres.parent_id = linages.tail_id
)

SELECT ARRAY_TO_STRING(linages.genre_names, ' → ') AS breadcumb
FROM linages;

The result contains every generation of the recursion.

breadcumb
Science Fiction
Science Fiction → Space opera
Science Fiction → Dystopian
Science Fiction → Dystopian → Cyberpunk

Early in my career I saw SQL as something to avoid or abstract away, but with time I’ve come to love munging data using SQL.

Server-sent events with Rails and Rack hijack

February 2022

Server-sent events1 (SSE) are a simple way to push data to clients over plain-old HTTP and rails has also provided a tidy DLS for SSE (via [ActionController::Live]) since Rails 4.

Unfortunately long-running HTTP connections in Rails controllers tie-up server threads, causing incoming requests to queue. Borrowing from Action Cable it’s possible to move these long running connections to their own threads and put those server threads back to work serving incoming requests.

The secret sauce is Rack “hijack”2 which let’s us take control of the actual [TCPSocket] backing the incoming request. When combined with the myriad concurrency primitives in modern Rails apps (via concurrent-ruby) it’s possible to handle as many open connections as system RAM and ulimit will allow.

class ApplicationController < ActionController::API
  def stream
    # Get the `[TCPSocket]` instance backing the request
    io = request.env["rack.hijack"].call

    # Send HTTP response line and relevant headers
    io.write(
      "HTTP/1.1 200\r\n" \
      "Content-Type: text/event-stream\r\n" \
      "Cache-Control: no-cache\r\n" \
      "\r\n"
    )

    # Periodically spawn a thread to send a keepalive
    keepalive = Concurrent::TimerTask.execute(execution_interval: 5) do
      io.write(":keepalive\n\n")
    end

    # Watch for and handle failed keepalives
    keepalive.add_observer do |_time, _result, ex|
      break unless ex.present?

      if ex.is_a?(Errno::EPIPE)
        # We expect "broken pipe" errors if we've written to a closed socket
        logger.debug("Client disconnected")
      end

      # Stop the timer task spawning new threads
      keepalive.shutdown

      # Close the socket
      io.close

      # Dereference everything so it can be garbage collected
      io = keepalive = nil
    end
  end
end

Testing our new action with curl we see the following:

$> curl -v --no-buffer http://localhost:3000/
*   Trying ::1:3000...
* Connected to localhost (::1) port 3000 (#0)
> GET / HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.77.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200
< Content-Type: text/event-stream
< Cache-Control: no-cache
* no chunk, no close, no size. Assume close to signal end
<
:keepalive

:keepalive

By “hijacking” the socket and passing it to a separate thread of sending data it’s possible to hold open as many connections as ulimit or system memory will allow, event on a single threaded server, while also still serving regular requests.

Reusing the configured Action Cable pub/sub adapter, available through the global ActionCable.server.pubsub, it’s possible to subscribe to and deliver events to clients in near realtime.

class ApplicationController < ActionController::API
  def stream
    # Get the `[TCPSocket]` instance backing the request
    io = request.env["rack.hijack"].call

    # Handler for new broadcasts
    on_message = ->(data) { io.write("data: #{data}\n\n") }

    # Send HTTP response line and relevant headers
    io.write(
      "HTTP/1.1 200\r\n" \
      "Content-Type: text/event-stream\r\n" \
      "Cache-Control: no-cache\r\n" \
      "\r\n"
    )

    # Subscribe to the "/sse/test" channel
    ActionCable.server.pubsub.subscribe("/sse/test", on_message)

    # Periodically spawn a thread to send a keepalive
    keepalive = Concurrent::TimerTask.execute(execution_interval: 5) do
      io.write(":keepalive\n\n")
    end

    # Watch for and handle failed keepalives
    keepalive.add_observer do |_time, _result, ex|
      break unless ex.present?

      if ex.is_a?(Errno::EPIPE)
        # We expect "broken pipe" errors if we've written to a closed socket
        logger.debug("Client disconnected")
      end

      # Unsubscribe from the "/sse/test" channel
      ActionCable.server.pubsub.unsubscribe("/sse/test", on_message)

      # Stop the timer task spawning new threads
      keepalive.shutdown

      # Close the socket
      io.close

      # Dereference everything so it can be garbage collected
      io = keepalive = on_message = nil
    end
  end
end

Broadcasting from the Rails console:

$> bin/rails c
Loading development environment (Rails 7.0.2)
irb(main):001:0> ActionCable.server.pubsub.broadcast("/sse/test", {"foo" => "bar"}.to_json)
=> 1

In curl we see the following:

$> curl -v --no-buffer http://localhost:3000/
*   Trying ::1:3000...
* Connected to localhost (::1) port 3000 (#0)
> GET / HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.77.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200
< Content-Type: text/event-stream
< Cache-Control: no-cache
* no chunk, no close, no size. Assume close to signal end
<
:keepalive

:keepalive

data: {"foo":"bar"}

:keepalive

📬

CURRENT_TIMESTAMP

February 2022

Turns out calling CURRENT_TIMESTAMP within a transaction will always return the time the transaction began1. In my mental model I’d always thought of it being the clock time for the statement.

Postgres aliases CURRENT_TIMESTAMP to transaction_timestamp() which is a more intention revealing name. There’s also the well named clock_timestamp() and statement_timestamp(), both of which simply do what-it-says-on-the-tin.