テストステ論

高テス協会会長が, テストステロンに関する情報をお届けします.

(writeboost report) How to reduce the resume time?

If you are a user of Writeboost, you might have been surprised that it takes very long when you resume your Writeboost-ed logical volume on rebooting the system, for example. It is because of reading out logs (and then computing the checksums to compare with what's on device) on the cache device thus, the time is proportional to the size of the cache device.

This article first explains what Writeboost does on resuming and then how to shorten the resume time.

Resuming the logs (or Log replay)

In resuming (under resume_cache()), Writeboost first search for the "possibly" oldest log on the device. And then continues to the newer ones. In each log, it first computes the checksum by the metadata itself and the data section of the log (which is typically 508KB in size). Comparing this with what is written on the on-disk metadata can examine the log was written without any failure (e.g. torn write). If it finds the log is broken then it stops any further resuming. This means the storage goes back to the point just before the log.

Otherwise, it continues to the end of the logs and on-memory metadata is ready. Every data on the cache device is side-effects. So, applying these from the oldest to the newest can reconstruct the state of the storage when it was shutdown earlier.

Set update_record_interval to shorten the resume time

However, iterating over all the logs takes really long. So, there is a way to reduce the time by remembering the ID of lastly written back.

Suppose the IDs of the logs are ID=k to ID=k+n-1 where n is the number of segment allocated on the device which is proportional to the size of the device. If we know that ID=k to ID=k+n-2 were written back to backing device before shutdown. What we need to resume is only the last log (ID=k+n-1). This reduces the resume time really effectively. If you are sensitive to the resume time, you can use this feature by setting update_record_interval to more than 0. (dmsetup message $devname 0 update_record_interval $val) Note that the unit is second and the default value is 0. This way, Superblock recorder will record the last ID written back to the super block every $val second.

The downside of this functionality is the following:
1. It ignores caches before the last_writeback_id and this may affect the read performance after reboot.
2. Writing small data (512B) to the fixed block periodically may shorten the lifetime of the SSD device if the FTL is useless.

What is the recommendation?

The logic to compute the last writeback ID is shown below. It is really simple so I think no bug will be found here. If you believe this code is correct and are in demand of reducing the resume time please go setting update_record_interval to 120, for example.

static int infer_last_writeback_id(struct wb_device *wb)
{
    int r = 0;

    u64 record_id;
    struct superblock_record_device uninitialized_var(record);
    r = read_superblock_record(&record, wb);
    if (r)
        return r;

    atomic64_set(&wb->last_writeback_segment_id,
        atomic64_read(&wb->last_flushed_segment_id) > wb->nr_segments ?
        atomic64_read(&wb->last_flushed_segment_id) - wb->nr_segments : 0);

    /*
    * If last_writeback_id is recorded on the super block
    * We can eliminate unnecessary writeback for the segments that
    * were written back before.
    */
    record_id = le64_to_cpu(record.last_writeback_segment_id);
    if (record_id > atomic64_read(&wb->last_writeback_segment_id))
        atomic64_set(&wb->last_writeback_segment_id, record_id);

    return r;
}

Thanks for reading.