Reputation: 343
I have an input file that does not have a consistent structure for the fields. What I'm trying to do is find the correct two fields and print their content on the same line.
EDIT: Here is a potential example for the input file:
abc=012 aaa=000 cba=210 bbb=111
aaa=555 abc=567 cba=765 bbb=666
aaa=444 abc=456 bbb=555 cba=654
This program almost works
awk '{for(i=1;i<=NF;i++){if ($i ~ /aaa/) {print $i}}}' file
awk '{for(i=1;i<=NF;i++){if ($i ~ /bbb/) {print $i}}}' file
However, this prints everything on a new field, and it does not display the data correctly either:
aaa=000
aaa=555
aaa=444
bbb=111
bbb=666
bbb=555
What I need is for the field aaa to follow the field bbb on the same line, like this:
aaa=000 bbb=111
aaa=555 bbb=666
aaa=444 bbb=555
How can this be done?
Upvotes: 3
Views: 3086
Reputation: 92904
awk solution:
awk '{ for(i=1;i<=NF;i++) if($i~/^(aaa|bbb)=/){ printf "%s%s",(!c++? "":FS),$i };
print ""; c=0 }' file
Or with short GNU awk (assuming that aaa
is always goes first):
awk 'match($0,/(aaa=[0-9]+).* (bbb=[0-9]+)/,a){ print a[1],a[2] }' file
The output for both approaches:
aaa=000 bbb=111
aaa=555 bbb=666
aaa=444 bbb=555
Upvotes: 1
Reputation: 3089
using GNU awk with multi char record separator
This will work even if bbb
comes before aaa
in the string
$ awk -v RS="[ \n]" '/aaa|bbb/{ printf $1 (i++%2==0? " " : ORS) }' file
Output:
aaa=000 bbb=111
aaa=555 bbb=666
aaa=444 bbb=555
-v RS="[ \n]"
: Set space and
\n
as the record separator
/aaa|bbb/{ printf $1 (i++%2==0? " " : ORS) }
: If field contains aaa
or bbb
then print it. If i++%2==0
then print append else append
\n
Upvotes: 1
Reputation: 16997
Here is awk
, using match()
and substr()
function, modify search="..."
variable according to your need, the order you input the same way it will give you result.
awk -v search="aaa,bbb" '
BEGIN{
n=split(search, arr, /,/)
}
{
for(i=1; i in arr; i++)
printf("%s%s", (match($0,"(^| )"arr[i]"=[^ ]*") ? substr($0,(RSTART>1?RSTART+1:RSTART),(RSTART>1?RLENGTH-1:RLENGTH)) : ""), i==n ? ORS : OFS)
}' infile
Test Results:
akshay@db-3325:/tmp$ cat infile
abc=012 aaa=000 cba=210 bbb=111
aaa=555 abc=567 cba=765 bbb=666
aaa=444 abc=456 bbb=555 cba=654
akshay@db-3325:/tmp$ awk -v search="aaa,bbb" '
BEGIN{
n=split(search, arr, /,/)
}
{
for(i=1; i in arr; i++)
printf("%s%s", (match($0,"(^| )"arr[i]"=[^ ]*") ? substr($0,(RSTART>1?RSTART+1:RSTART),(RSTART>1?RLENGTH-1:RLENGTH)) : ""), i==n ? ORS : OFS)
}' infile
aaa=000 bbb=111
aaa=555 bbb=666
aaa=444 bbb=555
Explanation
awk -v search="aaa,bbb" ' # call awk set variable search
BEGIN{
# split string in variable search
# into array, separated by comma
# arr[1] will have aaa
# arr[2] will have bbb
# variable n will have 2, which is count of array
n=split(search, arr, /,/)
}
{
# loop through array arr
for(i=1; i in arr; i++)
{
found = 0 # default state
# if there is match
# beginning or space followed by your word
# = anything except space char
# which creates regexp like :
# /(^| )aaa=[^ ]*/
# /(^| )bbb=[^ ]*/
# if matches then
if(match($0,"(^| )"arr[i]"=[^ ]*")){
# if it was not beginning then there will be space char
# lets increment starting position and decrement length
if(RSTART>1){
RSTART++ # we got space so one char +
RLENGTH-- # lenght one char -
}
found =1 # found flag
}
# ternary operator syntax : ( your_condition ) ? true_action : false_action
# if found is true then use substr
# else ""
# if i equal n then print output row separator else output field separaor
printf("%s%s", ( found ? substr($0,RSTART,RLENGTH) : ""), i==n ? ORS : OFS)
}
}' infile
Upvotes: 2