Hemant
Hemant

Reputation: 94

Transpose in Unix

I have an hourly data in file this way

2015-09-03 02:00:00 to 2015-09-03 02:59:59|ABC|673
2015-09-03 02:00:00 to 2015-09-03 02:59:59|AABC|52
2015-09-03 02:00:00 to 2015-09-03 02:59:59|ABCD|787
2015-09-03 02:00:00 to 2015-09-03 02:59:59|ADFGE|35
2015-09-03 02:00:00 to 2015-09-03 02:59:59|AGER|41
2015-09-03 02:00:00 to 2015-09-03 02:59:59|ETECFF|1384
2015-09-03 02:00:00 to 2015-09-03 02:59:59|TRIFD|38
2015-09-03 02:00:00 to 2015-09-03 02:59:59|CVGFFHG|166
2015-09-03 03:00:00 to 2015-09-03 03:59:59|FJREER|36
2015-09-03 03:00:00 to 2015-09-03 03:59:59|DFSD|31
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ASBF|38
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABC|36
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AABC|35
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABCD|33
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ADFGE|39
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AGER|33
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ETECFF|537
2015-09-03 03:00:00 to 2015-09-03 03:59:59|TRIFD|620635
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABC|37
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AABC|702
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABCD|319
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ADFGE|33
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AGER|306
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ETECFF|34
2015-09-03 03:00:00 to 2015-09-03 03:59:59|TRIFD|44
2015-09-03 03:00:00 to 2015-09-03 03:59:59|CVGFFHG|599
2015-09-03 03:00:00 to 2015-09-03 03:59:59|FJREER|30
2015-09-03 03:00:00 to 2015-09-03 03:59:59|DFSD|82

I want to transpose the data,

1. Column 1 should go in as column header 
2. Column 2 should go in row header
3. Column 3 is data
4. Any absence of data should be represented as 0 (Zero)

Here is how the transposed data should look like

|2015-09-03 02:00:00 to 2015-09-03 02:59:59|2015-09-03 03:00:00 to 2015-09-03 03:59:59
AABC|52|737
ABC|0|73
ABCD|787|352
ADFGE|35|72
AGER|41|339
ASBF|0|38
CVGFFHG|166|599
DFSD|0|113
ETECFF|1384|571
FJREER|0|66
TRIFD|38|620679

I have tried using sed, but that does not work. I am not quite good in awk yet, not yet reached to advanced level, so needed help here

Upvotes: 0

Views: 149

Answers (3)

NeronLeVelu
NeronLeVelu

Reputation: 10039

another awk

awk -F '|' '
  {
  Data[ $1, $2] = $3 + 1
  if( match( Headers, "(^\||)" $1 "(|\|$)" ) == 0 ) Headers = Headers $1 "|"
  if( match( Records, "(^\||)" $2 "(|\|$)" ) == 0 ) Records = Records $2 "|"
  }
END {
  cHeader = split( Headers, aHeader, "|" )
  cRecord = split( Records, aRecord, "|" )

  sub( /\|$/, "", Headers
  print "|" Headers

  for( iRecord = 1; iRecord <= cRecord; iRecord++) {
     printf "%s", aRecord[ 1]
     for( iHeader = 2; iHeader <= cHeader; iHeader++ ) {
        ThisData = Data[ aHeader[ iHeader], aRecord[ iRecord] ]
        printf "|%s", --ThisData
        }
     print
     }
  }
' YourFile
  • use a string as row and column name reminder and muti dimensionnal array for data.
  • use $3 + 1 and later --ThisData for forcing the 0

Upvotes: 0

meuh
meuh

Reputation: 12255

Here's a solution with awk. It holds in 2D array values the sum for all rows with the same keyword key and the same header column index i. At the END all these are printed for each key and column. Array cols is used to detect a change of header column. hdrs is used to keep the headers in the right order for output. keys is just used to keep a list of all the keywords.

awk -F'|' '
{ hdr = $1; key = $2; val = $3;
  if(cols[hdr]==0){
    cols[hdr] = ++column;
    hdrs[column] = hdr;
  }
  i = cols[hdr]
  keys[key] = 1
  values[i, key] += val
}
END{
  for(i = 1;i<=column;i++)
   printf  "|%s", hdrs[i]
  printf "\n"
  n = asorti(keys,sort)
  for(j = 1;j<=n;j++){
     key = sort[j]
     printf "%s",key
     for(i = 1;i<=column;i++)
      printf "|%s", values[i, key]+0
     printf "\n"
  }
}'

Upvotes: 2

Pieter21
Pieter21

Reputation: 1845

I think in awk you can create an array with index of type string, aka a dictionary with column 1 as key.

Each element of that array should be filled with another array with index string: column 2 as key.

Then process each line, creating the new array elements when necessary, and adding column 3 to the value.

For help with the syntax in awk:

http://www.thegeekstuff.com/2010/03/awk-arrays-explained-with-5-practical-examples/

Look at example 1 in section 5 how easy the final solution can be.

Upvotes: 0

Related Questions