Reubens4Dinner
Reubens4Dinner

Reputation: 343

Awk match multiple strings and print both fields on the same line

I have an input file that does not have a consistent structure for the fields. What I'm trying to do is find the correct two fields and print their content on the same line.

EDIT: Here is a potential example for the input file:

abc=012 aaa=000 cba=210 bbb=111
aaa=555 abc=567 cba=765 bbb=666
aaa=444 abc=456 bbb=555 cba=654

This program almost works

  awk '{for(i=1;i<=NF;i++){if ($i ~ /aaa/) {print $i}}}' file
  awk '{for(i=1;i<=NF;i++){if ($i ~ /bbb/) {print $i}}}' file

However, this prints everything on a new field, and it does not display the data correctly either:

aaa=000
aaa=555
aaa=444
bbb=111
bbb=666
bbb=555

What I need is for the field aaa to follow the field bbb on the same line, like this:

aaa=000 bbb=111
aaa=555 bbb=666
aaa=444 bbb=555

How can this be done?

Upvotes: 3

Views: 3086

Answers (3)

RomanPerekhrest
RomanPerekhrest

Reputation: 92904

awk solution:

awk '{ for(i=1;i<=NF;i++) if($i~/^(aaa|bbb)=/){ printf "%s%s",(!c++? "":FS),$i  }; 
       print ""; c=0 }' file

Or with short GNU awk (assuming that aaa is always goes first):

awk 'match($0,/(aaa=[0-9]+).* (bbb=[0-9]+)/,a){ print a[1],a[2] }' file

The output for both approaches:

aaa=000 bbb=111
aaa=555 bbb=666
aaa=444 bbb=555

Upvotes: 1

Rahul Verma
Rahul Verma

Reputation: 3089

using GNU awk with multi char record separator

This will work even if bbb comes before aaa in the string

$ awk -v RS="[ \n]" '/aaa|bbb/{ printf $1 (i++%2==0? " " : ORS) }' file

Output:

aaa=000 bbb=111
aaa=555 bbb=666
aaa=444 bbb=555

-v RS="[ \n]" : Set space and \n as the record separator

/aaa|bbb/{ printf $1 (i++%2==0? " " : ORS) } : If field contains aaa or bbb then print it. If i++%2==0 then print append else append \n

Upvotes: 1

Akshay Hegde
Akshay Hegde

Reputation: 16997

Here is awk, using match() and substr() function, modify search="..." variable according to your need, the order you input the same way it will give you result.

awk -v search="aaa,bbb" '
BEGIN{
    n=split(search, arr, /,/) 
}
{
    for(i=1; i in arr; i++)
          printf("%s%s", (match($0,"(^| )"arr[i]"=[^ ]*") ? substr($0,(RSTART>1?RSTART+1:RSTART),(RSTART>1?RLENGTH-1:RLENGTH)) : ""), i==n ? ORS : OFS)      
}' infile

Test Results:

akshay@db-3325:/tmp$ cat infile
abc=012 aaa=000 cba=210 bbb=111
aaa=555 abc=567 cba=765 bbb=666
aaa=444 abc=456 bbb=555 cba=654

akshay@db-3325:/tmp$ awk -v search="aaa,bbb" '
BEGIN{
    n=split(search, arr, /,/) 
}
{
    for(i=1; i in arr; i++)
          printf("%s%s", (match($0,"(^| )"arr[i]"=[^ ]*") ? substr($0,(RSTART>1?RSTART+1:RSTART),(RSTART>1?RLENGTH-1:RLENGTH)) : ""), i==n ? ORS : OFS)      
}' infile
aaa=000 bbb=111
aaa=555 bbb=666
aaa=444 bbb=555

Explanation

awk -v search="aaa,bbb" '             # call awk set variable search
BEGIN{

    # split string in variable search
    # into array, separated by comma
    # arr[1]  will have aaa
    # arr[2]  will have bbb
    # variable n will have 2, which is count of array

    n=split(search, arr, /,/) 
}
{
    # loop through array arr
    for(i=1; i in arr; i++)
    {
         found = 0                   # default state

         # if there is match
         # beginning or space followed by your word
         # = anything except space char
         # which creates regexp like : 
         #    /(^| )aaa=[^ ]*/
         #    /(^| )bbb=[^ ]*/
         # if matches then 

         if(match($0,"(^| )"arr[i]"=[^ ]*")){ 

             # if it was not beginning then there will be space char
             # lets increment starting position and decrement length
             if(RSTART>1){
               RSTART++              # we got space so one char +
               RLENGTH--             # lenght one char -
             }
            found =1                 # found flag
         }

         # ternary operator syntax : ( your_condition ) ? true_action : false_action 
         # if found is true then use substr
         # else ""
         # if i equal n then print output row separator else output field separaor
         printf("%s%s", ( found ? substr($0,RSTART,RLENGTH) : ""), i==n ? ORS : OFS)
    }      
}' infile

Upvotes: 2

Related Questions