BeantownGuy80
BeantownGuy80

Reputation: 51

Linux Block Device Driver: how to handle REQ_DISCARD

I have a block device driver which is working in a commercial product for more than a year. Recently I tried to add support for thin-provisioning by enabling discards and handling requests with the REQ_DISCARD flag. Whenever I call any variation of blk_end_request for these type of requests from any context I seem to get at least a BUG() output at best, and hangs or oops at worst (variations include blk_end_request_all and the unlocked versions prefixed with __). Also, it appears that when I try to complete the request this way (which works fine for normal read/write requests) that the filesystem driver above, ext4, reissues the same REQ_DISCARD request sometimes even with the same request pointer. Here's a simplified request function (as passed to blk_init_queue) that demonstrates the problem. This is about as early as I can turn around the request, so it eliminates almost all of my code, which again works for normal read/writes.

// This is a simplified version of the function that's passed into blk_init_queue
static void
my_request_fn(struct request_queue * queue)
{
    struct request * req;

    while ((req = blk_fetch_request(queue)) != NULL) {

        if (rq_data_dir(req) && (req->cmd_flags & REQ_DISCARD)) {
            printk(KERN_INFO "Received DISCARD request from process %d, sector=%lu, req %p\n",
                   pid_nr(task_pid(current)),
                   blk_rq_pos(req),
                   req);
            // FIXME: this is a lie
            __blk_end_request_all(req, 0);
            continue;
        }

// ... more code hidden for brevity
    }
}

Is there something about these requests that needs to be handled fundamentally differently? I tried looking at other drivers for example like the sd, md, xenblk,etc... but they are radically different so it's not clear. I guess the fundamental question is how do you properly handle REQ_DISCARD requests and notify/signal their completion?

In case this is a known bug, my kernel version reported by uname -a is Linux mydevbox 3.2.0-54-generic #82-Ubuntu SMP Tue Sep 10 20:08:42 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Upvotes: 2

Views: 1022

Answers (1)

BeantownGuy80
BeantownGuy80

Reputation: 51

I posted here because I was at my wits end. The solution was simple and maybe this will help someone else who is having the same problem. There are new limits associated with the discard functionality. Two of them are below.

// WARNING: these values are bad, do not use
queue->limits.discard_zeroes_data = 1;
queue->limits.max_discard_sectors = 1;
queue->limits.discard_granularity = 2048;

I had somehow transposed the last two values so discard granularity was very large and the maximum sectors was only 1. After commenting out the third line (which is supposed to be only a hint) and fixing the RHS value of the second line, everything works! The values look like the ones below.

queue->limits.discard_zeroes_data = 1;
queue->limits.max_discard_sectors = 2048
// queue->limits.discard_granularity = 1;

If you are getting intermittent crashes and BUG() stack traces with handling REQ_DISCARDED requests, double check your configuration.

Upvotes: 3

Related Questions