stack
stack

Reputation: 841

How to select the content in different column in a file with linux shell?

I have a file. This file has about 3,000 lines

I have selected four lines of it. The content is like:

user=bio-wangxf group=bio-jinwf etime=1556506215 start=1556506216 unique_node_count=1 end=1556524815 Exit_status=0
user=bio-wangxf group=bio-jinwf jobname=cellranger start=1556506216 end=1556555583 Exit_status=0 resources_used.cput=338425
user=maad-inspur01 group=maad-huangsd jobname=2d-1d9-4.3-1152-RK2 queue=cal-l start=1554626044 exec_host=cu017/0-23 end=1554626044
user=maad-inspur01 group=maad-huangsd jobname=testmatlab queue=cal-l ctime=1554632326 qtime=1554632326 etime=1554632326 start=1554632328 owner=maad-inspur01@ln01 exec_host=cu191/0-11 Resource_List.nodect=1 Resource_List.nodes=1:ppn=12 session=15549 unique_node_count=1 end=1554643410 Exit_status=0 resources_used.cput=7102 resources_used.mem=31315760kb resources_used.vmem=96803568kb resources_used.walltime=03:04:42
user=iese-liul group=iese-zhengchm jobname=ssh queue=fat ctime=1555483302 qtime=1555483302 etime=1555483302 start=1555489505 owner=iese-liul@ln04 exec_host=fat02/0-17,126-142 Resource_List.neednodes=1:ppn=35 Resource_List.nodect=1 Resource_List.nodes=1:ppn=35 Resource_List.walltime=72:00:00 session=31961 total_execution_slots=35 unique_node_count=1 end=1555498389 Exit_status=0 resources_used.cput=38523 

Now i want to select the user, group, start, end.

The correct result should be like:

user=bio-wangxf group=bio-jinwf start=1556506216 end=1556524815
user=bio-wangxf group=bio-jinwf start=1556506216 end=1556555583
user=maad-inspur01 group=maad-huangsd start=1554626044 end=1554626044
user=maad-inspur01 group=maad-huangsd start=1554632328 end=1554643410
user=iese-liul group=iese-zhengchm start=1555489505 end=1555498389

Because each row has a different num of column, i can not use awk to select.

I have tried:

awk '{if($15~/end/) print $1" "$2" "$4" "$15; else if($18~/end/) print $1" "$2" "$8" "$18}' filename

I can not get the correct result. some lines is missed, because start and end is not in the fixed column.

Who can help me?

Upvotes: 2

Views: 94

Answers (4)

Vijay
Vijay

Reputation: 67211

If you are OK with perl. Check the below solution:

perl -lane 'for(@F){$a.=" ".$_ if(/user=|start=|end=|group=/)}print $a;undef $a' your_file

Upvotes: 0

James Brown
James Brown

Reputation: 37394

You can still use awk:

$ awk '{
    for(i=1;i<=NF;i++)                       # loop fields
        if($i~/^(user|group|start|end)=/)    # look for keyword
            b=b (b==""?"":OFS) $i            # buffer matching field
    print b                                  # print buffer
    b=""                                     # reset and repeat
}' file

Output:

user=bio-wangxf group=bio-jinwf start=1556506216 end=1556524815
user=bio-wangxf group=bio-jinwf start=1556506216 end=1556555583
user=maad-inspur01 group=maad-huangsd start=1554626044 end=1554626044
user=maad-inspur01 group=maad-huangsd start=1554632328 end=1554643410
user=iese-liul group=iese-zhengchm start=1555489505 end=1555498389

Fields will be output in original order.

Upvotes: 4

kvantour
kvantour

Reputation: 26471

When you have a file with records/lines which consist of key-value pairs in the form of key1=value1_FS_key2=value2_FS_key3=value3 ... where _FS_ is a field-separator (delimiter), I generally would store all key value pairs in an array where you can use the key to lookup the value or the object of interest. In this case it is the complete key-value combination.

In awk this reads like:

awk '{for(i=1;i<=NF;++i) if(match($i,"=")) a[substr($i,1,RSTART-1)]=$i}
     { print a["user"],a["group"],a["start"],a["end"] }
     { delete a }' file

This method is extremely flexible and POSIX compliant. The following modifications are easily made:

  • Change the field separator: awk 'BEGIN{FS=OFS=";"}{...}'
  • Change the fields you want to output: just update the print statement

Of course, a problem could arise when you want to print a key which is not in the line. Assume "group" is not available in the line, currently, it would print something like:

user=bio-wangxf  start=1556506216 end=1556555583

This might not be what you want, and maybe you would like to have something like

user=bio-wangxf group=NA start=1556506216 end=1556555583

This can then be done with the usage of a simple function

awk 'function lookup(key) { return (key in a ? a[key] : key"=NA") }
     {for(i=1;i<=NF;++i) if(match($i,"=")) a[substr($i,1,RSTART-1)]=$i}
     { print lookup("user"),lookup("group"),lookup("start"),lookup("end") }
     { delete a }' file

Upvotes: 1

tshiono
tshiono

Reputation: 22012

Please try the following:

awk '
BEGIN {f["user"] = f["group"] = f["start"] = f["end"] = 1}
{for (i=1; i<=NF; i++) {
    split($i, a, "=")
    if (f[a[1]]) printf("%s ", $i)
 }
print ""
}' filename

The ugly point is each line contains an extra whitespace at the end of line.
Hope this helps.

Upvotes: 0

Related Questions