テストステ論

高テス協会会長が, テストステロンに関する情報をお届けします.

(writeboost report) how read-caching works

This post explains how writeboost's read-caching works in detail. Recently, I have received a inquiry about write-around caching by writeboost and Issue 111 still remains to complain the read-caching leads to bad blocks in some specific situation that's not reproducible with a arbitrary test script.

As of v2.2.3, read-caching is marked experimental.

read_cache_threshold (int) [Experimental]
  accepts: 0..127
  default: 0 (read caching disabled)
More than $read_cache_threshold * 4KB consecutive reads won't be staged.

this is not because I think the code is immature. I've written enough test codes to cover all code paths of read-caching, in both black-box and white-box. But just because the user tests isn't enough to unmark it. Since kernel software is close to the hardware, it should be tested on many different machines and of course, should work. One of them is Issue 111 by Dmitry and unfortunately it claims a bug.

I am writing this blog as a documentation draft and the aim is to make all users feel easy to use read-caching in their production.

writeboost is a log-structured writeback-caching software so it sounds bit strange that it can supports read-caching. The idea is implement read-caching as lightweight write-caching. The basic idea is:

  1. Only caches 4KB (full-sized) read requests
  2. Captivate read data into the read-cache cells, in the endio callback. (read_cache_cell_copy_data) There are 2048 cells preallocated. reserve_read_cache_cell is called when the read request results in cache miss (then read from the backing device) and the function reserves one of the cells. In endio, the read data payload is copied to the cell reserved.
  3. Once all cells are occupied writeboost "injects" the read data captivated in the cells into the write-caching path. (inject_read_cache) This is something like read-requests turns into writers. But the code path is simpler than the actual write path because the data is assured to be 4KB (never be partial) and there is no chance of overwriting the existing cache block because the captivated cell data was cancelled when the write-caching wrote the same address. (might_cancel_read_cache_cell)
struct read_cache_cell {
    sector_t sector;
    void *data; /* 4KB data read */
    bool cancelled; /* Don't include this */
    struct rb_node rb_node;
};

With the tunable parameter read_cache_threshold set, read-caching detects the sequentiality within the 2048 cells and cancels the cells that is sequential along the sector. read_cache_cell is maintained in the rb tree because taking the cells out of the tree is sorted in ascending order which is helpful to detect sequentiality.

the struct dirtiness indicates the dirtiness of a cache block. is_dirty is set false if it's read-cached and no need to write back. data_bits is the bit mask of valid regions in the 4KB cache block. Therefore all read-cached blocks are set is_dirty=false and data_bits = 255. This way, we can mix dirty and clean cache data.

struct dirtiness {
    bool is_dirty;
    u8 data_bits;
};

In summary, read-caching is implemented using write-caching so nothing special was added. The codebase is only 400 lines against the total 4000 lines. The algorithm is conservative so to be implemented safely. So why not using read-caching.