your programing

Linux 블록 장치에 대한 요청 대기열을 어떻게 식별 할 수 있습니까?

lovepro 2020. 10. 8. 08:25
반응형

Linux 블록 장치에 대한 요청 대기열을 어떻게 식별 할 수 있습니까?


네트워크를 통해 하드 디스크를 연결하는이 드라이버를 작업 중입니다. 컴퓨터에서 두 개 이상의 하드 디스크를 활성화하면 첫 번째 하드 디스크 만 파티션을 살펴보고 식별하는 버그가 있습니다. 결과적으로 hda에 1 개의 파티션이 있고 hdb에 1 개의 파티션이 있으면 hda를 연결하자마자 마운트 할 수있는 파티션이 있습니다. 따라서 hda1은 마운트되는 즉시 blkid xyz123을 얻습니다. 그러나 내가 계속해서 hdb1을 마운트하면 동일한 blkid가 나오고 실제로 드라이버는 hdb가 아닌 hda에서 읽습니다.

그래서 운전자가 엉망이 된 곳을 찾은 것 같습니다. 아래는 잘못된 장치에 액세스하는 것처럼 보이는 첫 번째 지점에 넣은 dump_stack을 포함하는 디버그 출력입니다.

다음은 코드 섹션입니다.

/*basically, this is just the request_queue processor. In the log output that
  follows, the second device, (hdb) has just been connected, right after hda
  was connected and hda1 was mounted to the system. */

void nblk_request_proc(struct request_queue *q)
{
struct request *req;
ndas_error_t err = NDAS_OK;

dump_stack();

while((req = NBLK_NEXT_REQUEST(q)) != NULL)
{
    dbgl_blk(8,"processing queue request from slot %d",SLOT_R(req));

    if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED, &(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags)))  {
        printk ("ndas: Queue is suspended\n");
        /* Queue is suspended */
#if ( LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,31) )
        blk_start_request(req);
#else
        blkdev_dequeue_request(req);
#endif

다음은 로그 출력입니다. 나는 무슨 일이 일어나고 있는지 그리고 어디에서 나쁜 전화가 오는지 이해하는 데 도움이되는 몇 가지 의견을 추가했습니다.

  /* Just below here you can see "slot" mentioned many times. This is the 
     identification for the network case in which the hd is connected to the 
     network. So you will see slot 2 in this log because the first device has 
     already been connected and mounted. */

  kernel: [231644.155503] BL|4|slot_enable|/driver/block/ctrldev.c:281|adding disk: slot=2, first_minor=16, capacity=976769072|nd/dpcd1,64:15:44.38,3828:10
  kernel: [231644.155588] BL|3|ndop_open|/driver/block/ops.c:233|ing bdev=f6823400|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155598] BL|2|ndop_open|/driver/block/ops.c:247|slot =0x2|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155606] BL|2|ndop_open|/driver/block/ops.c:248|dev_t=0x3c00010|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155615] ND|3|ndas_query_slot|netdisk/nddev.c:791|slot=2 sdev=d33e2080|nd/dpcd1,64:15:44.38,3696:10
  kernel: [231644.155624] ND|3|ndas_query_slot|netdisk/nddev.c:817|ed|nd/dpcd1,64:15:44.38,3696:10
  kernel: [231644.155631] BL|3|ndop_open|/driver/block/ops.c:326|mode=1|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155640] BL|3|ndop_open|/driver/block/ops.c:365|ed open|nd/dpcd1,64:15:44.38,3724:10
  kernel: [231644.155653] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2334|gendisk=c6afd800={major=60,first_minor=16,minors=0x10,disk_name=ndas-44700486-0,private_data=00000002,capacity=%lld}|nd/dpcd1,64:15:44.38,3660:10
  kernel: [231644.155668] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2346|ed|nd/dpcd1,64:15:44.38,3652:10

  /* So at this point the hard disk is added (gendisk=c6...) and the identifications
     all match the network device. The driver is now about to begin scanning the 
     hard drive for existing partitions. the little 'ed', at the end of the previous
     line indicates that revalidate_disk has finished it's job. 

     Also, I think the request queue is indicated by the output dpcd1 near the very
     end of the line. 

     Now below we have entered the function that is pasted above. In the function
     you can see that the slot can be determined by the queue. And the log output
     after the stack dump shows it is from slot 1. (The first network drive that was
     already mounted.) */

        kernel: [231644.155677]  ndas-44700486-0:Pid: 467, comm: nd/dpcd1 Tainted: P           2.6.32-5-686 #1
  kernel: [231644.155711] Call Trace:
  kernel: [231644.155723]  [<fc5a7685>] ? nblk_request_proc+0x9/0x10c [ndas_block]
  kernel: [231644.155732]  [<c11298db>] ? __generic_unplug_device+0x23/0x25
  kernel: [231644.155737]  [<c1129afb>] ? generic_unplug_device+0x1e/0x2e
  kernel: [231644.155743]  [<c1123090>] ? blk_unplug+0x2e/0x31
  kernel: [231644.155750]  [<c10cceec>] ? block_sync_page+0x33/0x34
  kernel: [231644.155756]  [<c108770c>] ? sync_page+0x35/0x3d
  kernel: [231644.155763]  [<c126d568>] ? __wait_on_bit_lock+0x31/0x6a
  kernel: [231644.155768]  [<c10876d7>] ? sync_page+0x0/0x3d
  kernel: [231644.155773]  [<c10876aa>] ? __lock_page+0x76/0x7e
  kernel: [231644.155780]  [<c1043f1f>] ? wake_bit_function+0x0/0x3c
  kernel: [231644.155785]  [<c1087b76>] ? do_read_cache_page+0xdf/0xf8
  kernel: [231644.155791]  [<c10d21b9>] ? blkdev_readpage+0x0/0xc
  kernel: [231644.155796]  [<c1087bbc>] ? read_cache_page_async+0x14/0x18
  kernel: [231644.155801]  [<c1087bc9>] ? read_cache_page+0x9/0xf
  kernel: [231644.155808]  [<c10ed6fc>] ? read_dev_sector+0x26/0x60
  kernel: [231644.155813]  [<c10ee368>] ? adfspart_check_ICS+0x20/0x14c
  kernel: [231644.155819]  [<c10ee138>] ? rescan_partitions+0x17e/0x378
  kernel: [231644.155825]  [<c10ee348>] ? adfspart_check_ICS+0x0/0x14c
  kernel: [231644.155830]  [<c10d26a3>] ? __blkdev_get+0x225/0x2c7
  kernel: [231644.155836]  [<c10ed7e6>] ? register_disk+0xb0/0xfd
  kernel: [231644.155843]  [<c112e33b>] ? add_disk+0x9a/0xe8
  kernel: [231644.155848]  [<c112dafd>] ? exact_match+0x0/0x4
  kernel: [231644.155853]  [<c112deae>] ? exact_lock+0x0/0xd
  kernel: [231644.155861]  [<fc5a8b80>] ? slot_enable+0x405/0x4a5 [ndas_block]
  kernel: [231644.155868]  [<fc5a8c63>] ? ndcmd_enabled_handler+0x43/0x9e [ndas_block]
  kernel: [231644.155874]  [<fc5a8c20>] ? ndcmd_enabled_handler+0x0/0x9e [ndas_block]
  kernel: [231644.155891]  [<fc54b22b>] ? notify_func+0x38/0x4b [ndas_core]
  kernel: [231644.155906]  [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
  kernel: [231644.155919]  [<fc562005>] ? _dpc_cancel+0x4c7/0x626 [ndas_core]
  kernel: [231644.155933]  [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
  kernel: [231644.155941]  [<c1003d47>] ? kernel_thread_helper+0x7/0x10

  /* here are the output of the driver debugs. They show that this operation is
     being performed on the first devices request queue. */

  kernel: [231644.155948] BL|8|nblk_request_proc|/driver/block/block26.c:494|processing queue request from slot 1|nd/dpcd1,64:15:44.38,3408:10
  kernel: [231644.155959] BL|8|nblk_handle_io|/driver/block/block26.c:374|struct ndas_slot sd = NDAS GET SLOT DEV(slot 1)
  kernel: [231644.155966] |nd/dpcd1,64:15:44.38,3328:10
  kernel: [231644.155970] BL|8|nblk_handle_io|/driver/block/block26.c:458|case READA call ndas_read(slot=1, ndas_req)|nd/dpcd1,64:15:44.38,3328:10
  kernel: [231644.155979] ND|8|ndas_read|netdisk/nddev.c:824|read io: slot=1, cmd=0, req=x00|nd/dpcd1,64:15:44.38,3320:10

충분한 배경 ​​정보가 되었기를 바랍니다. 이 시점에서 분명한 질문은 "request_queues는 언제 어디서 할당됩니까?"입니다.

이것은 add_disk 함수 이전에 조금 처리됩니다. 디스크 추가는 로그 출력의 첫 번째 줄입니다.

slot->disk = NULL;
spin_lock_init(&slot->lock);
slot->queue = blk_init_queue(
    nblk_request_proc, 
    &slot->lock
);

내가 아는 한 이것은 표준 작업입니다. 그래서 내 원래 질문으로 돌아갑니다. 어딘가에서 요청 대기열을 찾고 새 장치마다 증가하거나 고유한지 확인할 수 있습니까? 아니면 Linux 커널이 각 주요 번호에 대해 하나의 대기열 만 사용합니까? 이 드라이버가 두 개의 다른 블록 스토리지에 동일한 대기열을로드하는 이유를 알아보고 이것이 초기 등록 프로세스 중에 중복 blkid를 일으키는 지 확인하고 싶습니다.

Thanks for looking at this situation for me.


Queue = blk_init_queue(sbd_request, &Device.lock);

I share the solution to the bug that led me to post this question. Though it does not actually answer the question of how to identify the device request queue.

In the code above is the following:

if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED, 
       &(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags))) 

Well, that "SLOT_R(req)" was causing the trouble. That is defined else where to return the gendisk device.

#define SLOT_R(_request_) SLOT((_request_)->rq_disk)

This returned the disk, but not the proper value for various operations later on. So as the extra block devices were loaded, this function basically kept returning 1. (I think it was processing as a boolean.) Therefore, all requests were piled on the to the request queue for disk 1.

The fix was to access the correct disk identification value that was already stored in the disk's private_data when the it was added to the system.

Correct identifier definition:
   #define SLOT_R(_request_) ( (int) _request_->rq_disk->private_data )

How the correct disk number was stored.
   slot->disk->queue = slot->queue;
   slot->disk->private_data = (void*) (long) s;  <-- 's' is the disk id
   slot->queue_flags = 0;

Now the correct disk id is returned from private data, so all requests to the correct queue.

As mentioned, this does not show how to identify the queue though. An un-educated guess might be:

 x = (int) _request_->rq_disk->queue->id;

Ref. the request_queue function in linux http://lxr.free-electrons.com/source/include/linux/blkdev.h#L270 & 321

Thanks everyone for helping!

참고URL : https://stackoverflow.com/questions/6785651/how-can-i-identify-the-request-queue-for-a-linux-block-device

반응형