Jingnan Jia
Jingnan Jia

Reputation: 1299

How to extract the number after specific word using awk?

I have several lines of text. I want to extract the number after specific word using awk.

I tried the following code but it does not work.

At first, create the test file by: vi test.text. There are 3 columns (the 3 fields are generated by some other pipeline commands using awk).

Index  AllocTres                              CPUTotal
1      cpu=1,mem=256G                         18
2      cpu=2,mem=1024M                        16
3                                             4
4      cpu=12,gres/gpu=3                      12
5                                             8
6                                             9
7      cpu=13,gres/gpu=4,gres/gpu:ret6000=2   20
8      mem=12G,gres/gpu=3,gres/gpu:1080ti=1   21

Please note there are several empty fields in this file. what I want to achieve is to extract the number after the first gres/gpu= in each line (if no gres/gpu= occurs in this line, the default number is 0) using a pipeline like: cat test.text | awk '{some_commands}' to output 4 columns:

Index  AllocTres                              CPUTotal   GPUAllocated
1      cpu=1,mem=256G                         18         0
2      cpu=2,mem=1024M                        16         0
3                                             4          0
4      cpu=12,gres/gpu=3                      12         3
5                                             8          0
6                                             9          0
7      cpu=13,gres/gpu=4,gres/gpu:ret6000=2   20         4
8      mem=12G,gres/gpu=3,gres/gpu:1080ti=1   21         3

Upvotes: 0

Views: 1677

Answers (4)

ufopilot
ufopilot

Reputation: 3975

awk '
    BEGIN{FS="\t"} 
    NR==1{
        $(NF+1)="GPUAllocated"
    }
    NR>1{
        $(NF+1)=FS 0
    } 
    /gres\/gpu=/{
        split($0, a, "=")
        gp=a[3]; gsub(/[ ,].*/, "", gp)  
        $NF=FS gp
    }1' test.text 

Index  AllocTres                              CPUTotal GPUAllocated
1      cpu=1,mem=256G                         18        0
2      cpu=2,mem=1024M                        16        0
3                                             4         0
4      cpu=12,gres/gpu=3                      12        3
5                                             8         0
6                                             9         0
7      cpu=13,gres/gpu=4,gres/gpu:ret6000=2   20        4
8      mem=12G,gres/gpu=3,gres/gpu:1080ti=1   21        3

Upvotes: 0

RavinderSingh13
RavinderSingh13

Reputation: 133458

With your shown samples in GNU awk you can try following code. Written and tested in GNU awk. Simple explanation would be using awk's match function where using regex gres\/gpu=([0-9]+)(escaping / here) and creating one and only capturing group to capture all digits coming after =. Once match is found printing current line followed by array's arr's 1st element +0(to print zero in case no match found for any line) here.

awk '
FNR==1{
  print $0,"GPUAllocated"
  next
}
{
  match($0,/gres\/gpu=([0-9]+)/,arr)
  print $0,arr[1]+0
}
' Input_file

Upvotes: 1

sseLtaH
sseLtaH

Reputation: 11217

Using sed

$ sed '1s/$/\tGPUAllocated/;s~.*gres/gpu=\([0-9]\).*~& \t\1~;1!{\~gres/gpu=[0-9]~!s/$/ \t0/}' input_file
Index  AllocTres                              CPUTotal  GPUAllocated
1      cpu=1,mem=256G                         18        0
2      cpu=2,mem=1024M                        16        0
3                                             4         0
4      cpu=12,gres/gpu=3                      12        3
5                                             8         0
6                                             9         0
7      cpu=13,gres/gpu=4,gres/gpu:ret6000=2   20        4
8      mem=12G,gres/gpu=3,gres/gpu:1080ti=1   21        3

Upvotes: 0

Daweo
Daweo

Reputation: 36390

Firstly: awk do not need cat, it could read files on its' own. Combining cat and awk is generally discouraged as useless use of cat.

For this task I would use GNU AWK following way, let file.txt content be

cpu=1,mem=256G
cpu=2,mem=1024M

cpu=12,gres/gpu=3


cpu=13,gres/gpu=4,gres/gpu:ret6000=2
mem=12G,gres/gpu=3,gres/gpu:1080ti=1

then

awk 'BEGIN{FS="gres/gpu="}{print $2+0}' file.txt

output

0
0
0
3
0
0
4
3

Explanation: I inform GNU AWK that field separator (FS) is gres/gpu= then for each line I do print 2nd field increased by zero. For lines without gres/gpu= $2 is empty string, when used in arithmetic context this is same as zero so zero plus zero gives zero. For lines with at least one gres/gpu= increasing by zero provokes GNU AWK to find longest prefix which is legal number, thus 3 (4th line) becomes 3, 4, (7th line) becomes 4, 3, (8th line) becomes 3.

(tested in GNU Awk 5.0.1)

Upvotes: 2

Related Questions