xeruf
xeruf

Reputation: 2990

Process part of a line through the shell pipe

I would like to process part of each line of command output, leaving the rest untouched.

Problem

Let's say I have some du output:

❯ du -xhd 0 /usr/lib/gr*
3.2M    /usr/lib/GraphicsMagick-1.3.40
584K    /usr/lib/grantlee
12K /usr/lib/graphene-1.0
4.2M    /usr/lib/graphviz
4.0K    /usr/lib/grcrt1.o
224K    /usr/lib/groff

Now I want to process each path with another command, for example running pacman -Qo on it, leaving the remainder of the line untouched.

Approach

I know I can use awk {print $2} to get only the path, and could probably use it in a convoluted for loop to weld it back together, but maybe there is an elegant way, ideally easy to type on the fly, producing this in the end:

3.2M    /usr/lib/GraphicsMagick-1.3.40/ is owned by graphicsmagick 1.3.40-2
584K    /usr/lib/grantlee/ is owned by grantlee 5.3.1-1
12K /usr/lib/graphene-1.0/ is owned by graphene 1.10.8-1
4.2M    /usr/lib/graphviz/ is owned by graphviz 7.1.0-1
4.0K    /usr/lib/grcrt1.o is owned by glibc 2.36-7
224K    /usr/lib/groff/ is owned by groff 1.22.4-7

Workaround

This is the convoluted contraption I am living with for now:

❯ du -xhd 0 /usr/lib/gr* | while read line; do echo "$line $(pacman -Qqo $(echo $line | awk '{print $2}') | paste -s -d',')"; done | column -t
3.2M  /usr/lib/GraphicsMagick-1.3.40  graphicsmagick
584K  /usr/lib/grantlee               grantlee,grantleetheme
12K   /usr/lib/graphene-1.0           graphene
4.2M  /usr/lib/graphviz               graphviz
4.0K  /usr/lib/grcrt1.o               glibc
224K  /usr/lib/groff                  groff

But multiple parts of it are pacman-specific.

du -xhd 0 /usr/lib/gr* | while read line; do echo "$line" | awk -n '{ORS=" "; print $1}'; pacman --color=always -Qo $(echo $line | awk '{print $2}') | head -1; done | column -t
3.2M  /usr/lib/GraphicsMagick-1.3.40/  is  owned  by  graphicsmagick  1.3.40-2
584K  /usr/lib/grantlee/               is  owned  by  grantlee        5.3.1-1
12K   /usr/lib/graphene-1.0/           is  owned  by  graphene        1.10.8-1
4.2M  /usr/lib/graphviz/               is  owned  by  graphviz        7.1.0-1
4.0K  /usr/lib/grcrt1.o                is  owned  by  glibc           2.36-7
224K  /usr/lib/groff/                  is  owned  by  groff           1.22.4-7

This is a more generic solution, but what if there are three columns of output and I want to process only the middle one? It grows in complexity, and I thought there must be a simpler way avoiding duplication.

Upvotes: 0

Views: 80

Answers (2)

tripleee
tripleee

Reputation: 189447

Use a simple shell loop.

du -xhd 0 /usr/lib/gr* |
while read -r size package; do
    pacman --color=always -Qo "$package" |
    awk -v sz="$size" '{
      printf "%s is owned by %s\n", sz, $0 }'
done

If you want to split out parts of the output from pacman, Awk makes that easy to do; for example, the package name is probably in Awk's $1 and the version in $2.

(Sorry, don't have pacman here; perhaps edit your question to show its output if you need more details. Going forward, please take care to ask the actual question you need help with, so you don't have to move the goalposts by editing after you have received replies - this is problematic for many reasons, not least of which because the answers you already received will seem wrong or unintelligible if they no longer answer the question as it stands after your edit.)

These days, many tools have options to let you specify which fields exactly you want to output, and a formatting option to produce them in machine-readable format. The pacman man page mentions a --machinereadable option, though it does not seem to be of particular use here. Many modern tools will produce JSON, which can be unwieldy to handle in shell scripts, but easy if you have a tool like jq which understands JSON format (less convenient if the only available output format is XML; some tools will let you get the result as CSV, which is mildly clumsy but relatively easy to parse). Maybe also look for an option like --format for specifying how exactly to arrange the output. (In curl it's called -w/--write-out.)

Upvotes: 2

oguz ismail
oguz ismail

Reputation: 50775

Use a bash loop

(
  IFS=$'\t'
  while read -r -a fields; do
    fields[1]=$(pacman -Qo "${fields[1]}")
    printf '%s\n' "${fields[*]}"
  done
)

Upvotes: 2

Related Questions