Hunting Memory Bloat in OPEN Tech

Advice and case studies on getting OPEN Tech to run more memory-efficiently, so we can serve more members, faster.

14 February 2019 13:04

Low-hanging Fruit: Config Tweaks

A quick fix

Ruby allocates memory using the C memory allocator function, 'malloc'. Malloc's default configuration does not play well with highly threaded ruby apps (like puma server processes or sidekiq workers). You can reduce memory consumption of such processes by around ~30-50% by adding the following environment variable to your configuration:

    MALLOC_ARENA_MAX=2

If you are curious about exactly what this does, this article has the gory details.

The Nuclear Option

Jemalloc is a plug-in replacement for malloc, which reportedly causes much less memory fragmentation on typical Ruby workloads. Installing it may require compiling your own Ruby interpreter binaries, or using custom Heroku buildpacks. We have not tried this at 38 Degrees.

Tracking Down Memory Bloat

The following tools have been useful in diagnosing memory bloat issues:

The 'memory_profiler' gem
    MemoryProfiler.report {
        # your code to be profiled goes here
    }

This will print a summary of memory usage, broken down by object types, by gem, and by file.

Print-debugging of memory consumption

You can find process size (in KiB) with:

    rss = `ps -o rss= -p #{$$}`.to_i
    

Growth caused by a particular endpoint can be logged with something like this (Padrino example):


    class App < Padrino::Application
      before do
        Thread.current[:pre_request_rss] = `ps -o rss= -p #{$$}`.strip.to_i
      end

      after do
        post_request_rss = `ps -o rss= -p #{$$}`.strip.to_i
        Padrino.logger.warn "Memory growth #{post_request_rss - Thread.current[:pre_request_rss]} bytes (to #{post_request_rss}) handling #{request.url}"
      end
    

This technique is particularly useful if you have no idea where the possible source of excessive memory usage is. Once you have narrowed it down to one endpoint, you can narrow down the lines of code covered by the memory usage logging, until you can see a clear location likely to be producing the excessive memory usage.

Note that this technique shows whole process memory usage, and all threads will contribute to this. A long-running endpoint will therefore affect the memory usage stats of unrelated endpoints. Bear this in mind if you are seeing improbable outliers in these stats. If you are testing locally, you can start puma with a single thread to avoid this.

Case study: Identity list pushing sometimes freezes

We first encountered this issue when creating lists from searches. List pushing would seem to 'freeze', but would generally complete if we restarted sidekiq. On further investigation, we noticed that the PushSearchToListWorker run coincided with a rapid growth in memory consumption, ended by heroku killing the process due to massively exceeding memory quota (R15 error).

Reading Search#push_to_list, we looked for locations where a large data set might be loaded into memory. We found a possible culprit, the loading of an entire Search's member_ids, in the form

 

[{'id': 1234}, ...]

Using MemoryProfiler, we can demonstrate that such a data structure will use around 203 MB of RAM (around 20% of the sidekiq process' memory quota) for a data set of 1 million members:

    MemoryProfiler.report {
        MemoryProfiler.report { (0...1000000).map{ |n| {id: n} } }
    }
    =>
    ...
    @total_allocated=1000003,
    @total_allocated_memsize=203636472

Also note that 1000000 hash objects must be created (@total_allocated). We can simplify this data structure by getting rid of the hashes, and just using a list of member ids:

    MemoryProfiler.report {
        MemoryProfiler.report { (0...1000000).to_a }
    }
    =>
    ...
    @total_allocated=2,
    @total_allocated_memsize=11636392,

Now we are storing essentially the same data in only 12 MB of RAM, and have only allocated 2 ruby objects (rather than a million). The change to identity code (in diff-like notation) is to replace:

    result = ActiveRecord::Base.connection.execute(sql)

with

    result = ActiveRecord::Base.connection.execute(sql).column_values(0)

And to make the appropriate changes where 'result' is used.

But why did this change reduce memory consumption so massively? It boils down to details of how Ruby allocates memory for objects. Remember that the list of a million hashes required allocating just over 1000000 objects, while the list of a million numbers required only 2 objects to be created. Surely if a list contains a million items, we should have created at least a million objects! Ruby lists actually contain references or pointers to the contained objects. Some Ruby types are so simple that they can be encoded inside the pointer. These types are: nil, true, false, Integer, Float. This means that a list of 1000000 integers will only take up the space required for 1000000 references, which are 8 bytes each. 8 * 10000000 = 8 MB. Wasted space due to how lists allocate new space probably brings us up to the 12 MB we measured.

The list of hashes didn't benefit from this economy, as each hash requires its own space:

    MemoryProfiler.report { {id: 1} }
    =>
    ...
    @total_allocated=1
    @total_allocated_memsize=192

A hash with a single key needs 192 bytes of memory, so 1 million of these will take 192 MB of space. Add the space required for the list of references itself, 8 MB, and we get a round 200 MB. This theoretical number is very close to the measured 203 MB.

For hashes with more keys, or with keys or values that require some memory to be allocated (ie not nil, true, false, Integer or Float), much more than 192 bytes may be required.

 

Case study: Speakout edit page loads very slowly

Opening a Campaign Edit page in speakout (/campaigns/:id/edit) was very slow (around 10 second page load times), and we suspected it was also causing high memory usage on the Speakout web process. Additionally, once the page had loaded it did not respond smoothly to user interaction.

Looking at the source HTML code of the edit page revealed a very probable cause: A <select> input had been rendered with one <option> for each active Campaign in the database: around 5000 options! Similarly huge <select> inputs had been generated for 'decisions' and 'custom layouts'. Complete Campaign objects were loaded, even though only two fields were used the build the select input. A look at schema.rb showed that Campaign table has over 90 columns! This led us to an initial fix that was similar in approach to the previous case study: generate a structure containing only the data you need. We replaced:

   Campaign.where('finishes_at > ? OR finishes_at IS NULL', DateTime.now).order(internal_name: :desc)

with

   Campaign.where('finishes_at > ? OR finishes_at IS NULL', DateTime.now).order(internal_name: :desc).select(:id, :internal_name)

This greatly improved memory consumption, but the generated page source was still huge, causing sluggish browser behaviour and still unacceptable page load times.

Finally, we ripped out the statically generated <option> sets, replacing them with 'selectize' searchable ajax widgets. In the ajax endpoints, we ensured that only the required fields were loaded, and no more than 100 results would be returned:

    Campaign.where('(finishes_at > ? OR finishes_at IS NULL) and internal_name ILIKE ?', DateTime.now, "%#{query}%")
            .order(internal_name: :desc)
            .limit(100)
            .select(:id, :internal_name)

Finally we had fast page build times, low memory usage, and no browser slow-down.

Programming Memory-Consciously

To avoid creating issues like these, it is worth familiarizing yourself with the typical sizes of Ruby objects. MemoryProfiler.report may be useful for this. When bulk-loading collections, have an idea of how large the collection might grow in production, and the memory consumption of a single object. Do you need to load complete entities, or will just a few fields do? Often, memory-efficient code is also more CPU-efficient and involves more efficient Database queries.

Comments

James Hulme
MALLOC_ARENA_MAX=2

So much effect from such a small config change
Cindy Xie
Amazing!