Postgres: Deadlock resolution

Within the confines of the application trends in usage dictate whether synchronous data update is likely to occur. Such an update potentially cause resources to be in demand from more than one source.


 
  # ======> AR relationships                    <======

  has_one :address, as: :addressable, :dependent => :destroy
  has_one :contact, as: :contactable, :dependent => :destroy
  has_one :commission_split, :dependent => :destroy
  belongs_to :company, touch: true
  belongs_to :user, touch: true
  belongs_to :office_manager, touch: true
  has_many :approvals, as: :approvable, :dependent => :destroy
  has_many :assigned_users, as: :assignable, dependent: :destroy
  has_many :assets, as: :attachable, :dependent => :destroy
 

The scenario where an object A instance is waiting on a specific record mapped to object B at the same time the object B record awaits on object A. A deadlock. The immediate need is to resolve this situation so the users do not see a stream of 500 errors. First stop identify the postgresql process. From the manual:


$ ps auxww | grep ^postgres
postgres   960  0.0  1.1  6104 1480 pts/1    SN   13:17   0:00 postmaster -i
postgres   963  0.0  1.1  7084 1472 pts/1    SN   13:17   0:00 postgres: stats buffer process   
postgres   965  0.0  1.1  6152 1512 pts/1    SN   13:17   0:00 postgres: stats collector process   
postgres   998  0.0  2.3  6532 2992 pts/1    SN   13:18   0:00 postgres: tgl runbug 127.0.0.1 idle
postgres  1003  0.0  2.4  6532 3128 pts/1    SN   13:19   0:00 postgres: tgl regression [local] SELECT waiting
postgres  1016  0.1  2.4  6532 3080 pts/1    SN   13:19   0:00 postgres: tgl regression [local] idle in transaction

You can see the infrastructure processes associated with Postgres. Subtract those and you can easily see the waiting process.

All that remains is to kill the process id:


$ ps auxww | grep ^postgres
SELECT pg_cancel_backend('1003’);

Now the immediate concern is mitigated it is time to change our code so this does not happen again. At least does not happen with the same results. if there is not a high likelihood this will happen again we would use Optomistic Locking. If there was a high likelikehood of reoccurrence we would use Pessimistic Locking.

Optimistic Locking is implemented by Rails where Pessimistic Locking is implemented at the database level.

The Optimistic part means that we can service as many users as possible, since we will allow reading to all and then implement a locking mechanism when we attempt an update. The first step is to create a migration to modifiy your Model to have a designated lock column:


class AddLockVersionToOrder < ActiveRecord::Migration
  def up
    if (ActiveRecord::Base.connection.table_exists? 'orders')
      unless ActiveRecord::Base.connection.column_exists? :orders, :lock_version
        add_column :orders, :lock_version, :integer
      end
    end
  end

  def down
    if (ActiveRecord::Base.connection.table_exists? 'orders')
      if ActiveRecord::Base.connection.column_exists? :orders, :lock_version
        remove_column :orders, :lock_version
      end
    end
  end
end

Now a ActiveRecord::StaleObjectError error will be raise if the attempt is made to modify the same object at the same time. This works because internally lock_version gets incremented when a the process grabs the record. If when the update is attempted the Order object’s lock_version does not match the current object’s lock_version then the error gets raised.

This means it is the application responsibility to handle this error:


  def set_order_price
    begin
      self.order.price = 100000.00
      self.order.save!
    rescue ActiveRecord::StaleObjectError
      logger.error "Changes to this record: #{self.order.inspect} were made since you started editing. Please try again."
      raise "General Error Occurred"
    end
  end

Pessimistic Locking would kick in at the database level when invoking either lock! or the with_lock - which wraps inside a transaction. The lock clears when save! is executed.