rvbarreto
rvbarreto

Reputation: 691

AWK - use column value from input to create new columns on output

I have a file like this:

       Date     Time   Level Gridsize    Miss      Temp    Parameter ID
 1988-05-16 12:00:00      -3        1       0     27.060    -1            
 1988-05-16 12:00:00      -3        1       0     9.0300    -2            
 1988-05-16 12:00:00      -3        1       0     1.2000    -3            
 1988-05-17 12:00:00      -3        1       0     27.100    -1            
 1988-05-17 12:00:00      -3        1       0     9.0200    -2            
 1988-05-17 12:00:00      -3        1       0     1.2300    -3            
 1988-05-18 12:00:00      -3        1       0     27.190    -1            
 1988-05-18 12:00:00      -3        1       0     9.0400    -2            
 1988-05-18 12:00:00      -3        1       0     1.2200    -3            

These are temperature data from sensors in different locations. The field Parameter ID determines if the data was taken from sensor -1, -2 or -3. I'm exporting this data to a CSV file to analyze it. The desired format is:

   Date     Time           -1       -2        -3    
1988-05-16 12:00:00      27.060    9.0300    1.2000                     
1988-05-17 12:00:00      27.100    9.0200    1.2300
1988-05-18 12:00:00      27.190    9.0400    1.2200   

It groups data by date and time, and separates in columns regarding the value of Parameter ID.

I'm not sure about how doable this is with AWK, but I'm once I mostly use SSH to prepare this data, a solution with AWK is very interesting for me. Also, if there are other tools to achieve this on bash, I'm interested :) (sed, whatever, but got to use native linux commands because I'm not allowed to install anything on the server)


What I'm doing nowadays

Nowadays I run a code for each sensor

$ awk '$NF == "-1" {print $1";"$2";"$6}' > netcdf_extract_1.csv
$ awk '$NF == "-2" {print $1";"$2";"$6}' > netcdf_extract_2.csv
$ awk '$NF == "-3" {print $1";"$2";"$6}' > netcdf_extract_3.csv

And import them on python. Then I group them by date and time and got my table.

Upvotes: 1

Views: 299

Answers (2)

JFS31
JFS31

Reputation: 518

Here is my attempt:

awk 'BEGIN{print "Date\t","Time\t", "-1\t", "-2\t", "-3\t";
    PROCINFO["sorted_in"] = "@ind_str_asc" }
    NR > 1{ a[$1] = a[$1]" " $6 ; b[$1] = $2;next }
    END{ for ( i in a ) print i, b[i], a[i] }' file1 | column -t

Output:

Date        Time      -1      -2      -3
1988-05-16  12:00:00  27.060  9.0300  1.2000
1988-05-17  12:00:00  27.100  9.0200  1.2300
1988-05-18  12:00:00  27.190  9.0400  1.2200

As Roman has everything in one array I splittet the task up to two arrays.

Upvotes: 1

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

GNU awk solution:

awk 'BEGIN{ PROCINFO["sorted_in"]="@ind_str_asc"; }
     NR > 1{ a[$1"\t"$2][$7]=$6 }
     END{ 
         printf "Date\tTime\t-1\t-2\t-3\n"; 
         for (i in a) print i,a[i][-1],a[i][-2],a[i][-3] 
     }' file | column -t
  • a[$1"\t"$2][$7]=$6 :
    • a - multidimensional array
    • $1"\t"$2 - array key/index constructed by concatenation of Date/Time fields
    • [$7] - inner array indexed with Parameter ID field value
    • $6 - crucial value from Temp field

The output:

Date        Time      -1      -2      -3
1988-05-16  12:00:00  27.060  9.0300  1.2000
1988-05-17  12:00:00  27.100  9.0200  1.2300
1988-05-18  12:00:00  27.190  9.0400  1.2200

Upvotes: 2

Related Questions