ksm18
ksm18

Reputation: 93

Most efficient way to cut an output between regex matches?

I'm trying to parse an

lspci -k 

output by each device. In other words, with this sample output:

00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
    Subsystem: Gigabyte Technology Co., Ltd Device 5000
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
    Subsystem: Gigabyte Technology Co., Ltd Device d000
    Kernel driver in use: i915
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
    Subsystem: Gigabyte Technology Co., Ltd Device 5000
    Kernel driver in use: snd_hda_intel
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)
    Subsystem: Gigabyte Technology Co., Ltd Device 5001
    Kernel driver in use: mei_me

I want to be able to traverse through each file and associated information individually. My regular expression I'm using to detect the format ??:??.? elsewhere is:

grep -E '^[0-9]\w:[0-9]\w\.[0-9]' <<< "$s" | awk -F ' ' '{print $1}'

where $s would be how many ever devices on the list in this format. I was using this since I had non PCI devices listed in a different format.

In this case, I was thinking I could get the line number of each match, so pipe the above statement into

grep -n 

then using sed cut from one region to the next, but I feel this wouldn't be an efficient way of going about this. Any suggestions?

Another solution I'm considering is reading in line-by-line and converting the whitespace into some symbol: e.g.

tr ' ' '%' 

and if a line starts with that, it is included. This could get tricky however, because I would need an external variable outside the loop. Of course I could also possibly add a \n after each instance of the regex and then just set the:

IFS=$'\n'

Given that they are tabbed, a

tr $'\t' 'x'

works well. However, I feel the most efficient way is still to somehow cut an entire section then grep the information I need, as opposed to going line-by-line with random variables.

Upvotes: 2

Views: 340

Answers (2)

Jani
Jani

Reputation: 911

Are you limited to lspci -k? See for example lspci -mm and lspci -vkmm which are much easier to parse.

I am not sure what your end goal is, but I cooked up an example that you might find interesting. Provided that you're indeed not limited to lspci -k.

#!/bin/sh
for slot in `lspci -mm | cut -d " " -f 1`; do
    driver=`lspci -vkmm -s $slot | grep "^Driver:" | cut -f 2`
    if [ -n "$driver" ]; then
        filename=`/sbin/modinfo --filename $driver 2>/dev/null`
        echo $slot $driver $filename
    fi
done

Upvotes: 1

Tom Fenech
Tom Fenech

Reputation: 74595

The following code splits each entry from lspci -k into sections:

$ /sbin/lspci -k | awk -F'\t' 'NF == 1 { ++n; f = 0 } { a[n, ++f] = $NF } 
END { 
    for (i = 1; i <= n; ++i) { 
        print "section", i; f = 0; while (a[i, ++f]) print a[i, f]; print "" 
    }
}'

By setting the input field separator to a tab character, we can identify which lines are the start of a new section by how many fields they have; the start of each section only has 1 field.

The code in the END block demonstrates the fact that each field can be reached in the array a using the two indices section number and field number. It just loops through each one but you could customise the logic to print a given field if it matched a pattern, for example.

Upvotes: 3

Related Questions