Most efficient way to cut an output between regex matches?

Question

I'm trying to parse an

lspci -k

output by each device. In other words, with this sample output:

00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
    Subsystem: Gigabyte Technology Co., Ltd Device 5000
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
    Subsystem: Gigabyte Technology Co., Ltd Device d000
    Kernel driver in use: i915
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
    Subsystem: Gigabyte Technology Co., Ltd Device 5000
    Kernel driver in use: snd_hda_intel
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)
    Subsystem: Gigabyte Technology Co., Ltd Device 5001
    Kernel driver in use: mei_me

I want to be able to traverse through each file and associated information individually. My regular expression I'm using to detect the format ??:??.? elsewhere is:

grep -E '^[0-9]\w:[0-9]\w\.[0-9]' <<< "$s" | awk -F ' ' '{print $1}'

where $s would be how many ever devices on the list in this format. I was using this since I had non PCI devices listed in a different format.

In this case, I was thinking I could get the line number of each match, so pipe the above statement into

grep -n

then using sed cut from one region to the next, but I feel this wouldn't be an efficient way of going about this. Any suggestions?

Another solution I'm considering is reading in line-by-line and converting the whitespace into some symbol: e.g.

tr ' ' '%'

and if a line starts with that, it is included. This could get tricky however, because I would need an external variable outside the loop. Of course I could also possibly add a after each instance of the regex and then just set the:

IFS=$'
'

Given that they are tabbed, a

tr $'	' 'x'

works well. However, I feel the most efficient way is still to somehow cut an entire section then grep the information I need, as opposed to going line-by-line with random variables.

Tom Fenech · Accepted Answer

The following code splits each entry from lspci -k into sections:

$ /sbin/lspci -k | awk -F'	' 'NF == 1 { ++n; f = 0 } { a[n, ++f] = $NF } 
END { 
    for (i = 1; i <= n; ++i) { 
        print "section", i; f = 0; while (a[i, ++f]) print a[i, f]; print "" 
    }
}'

By setting the input field separator to a tab character, we can identify which lines are the start of a new section by how many fields they have; the start of each section only has 1 field.

The code in the END block demonstrates the fact that each field can be reached in the array a using the two indices section number and field number. It just loops through each one but you could customise the logic to print a given field if it matched a pattern, for example.

Most efficient way to cut an output between regex matches?

Answers (2)

Related Questions