Reputation: 94
I have an hourly data in file this way
2015-09-03 02:00:00 to 2015-09-03 02:59:59|ABC|673
2015-09-03 02:00:00 to 2015-09-03 02:59:59|AABC|52
2015-09-03 02:00:00 to 2015-09-03 02:59:59|ABCD|787
2015-09-03 02:00:00 to 2015-09-03 02:59:59|ADFGE|35
2015-09-03 02:00:00 to 2015-09-03 02:59:59|AGER|41
2015-09-03 02:00:00 to 2015-09-03 02:59:59|ETECFF|1384
2015-09-03 02:00:00 to 2015-09-03 02:59:59|TRIFD|38
2015-09-03 02:00:00 to 2015-09-03 02:59:59|CVGFFHG|166
2015-09-03 03:00:00 to 2015-09-03 03:59:59|FJREER|36
2015-09-03 03:00:00 to 2015-09-03 03:59:59|DFSD|31
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ASBF|38
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABC|36
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AABC|35
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABCD|33
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ADFGE|39
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AGER|33
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ETECFF|537
2015-09-03 03:00:00 to 2015-09-03 03:59:59|TRIFD|620635
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABC|37
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AABC|702
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ABCD|319
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ADFGE|33
2015-09-03 03:00:00 to 2015-09-03 03:59:59|AGER|306
2015-09-03 03:00:00 to 2015-09-03 03:59:59|ETECFF|34
2015-09-03 03:00:00 to 2015-09-03 03:59:59|TRIFD|44
2015-09-03 03:00:00 to 2015-09-03 03:59:59|CVGFFHG|599
2015-09-03 03:00:00 to 2015-09-03 03:59:59|FJREER|30
2015-09-03 03:00:00 to 2015-09-03 03:59:59|DFSD|82
I want to transpose the data,
1. Column 1 should go in as column header
2. Column 2 should go in row header
3. Column 3 is data
4. Any absence of data should be represented as 0 (Zero)
Here is how the transposed data should look like
|2015-09-03 02:00:00 to 2015-09-03 02:59:59|2015-09-03 03:00:00 to 2015-09-03 03:59:59
AABC|52|737
ABC|0|73
ABCD|787|352
ADFGE|35|72
AGER|41|339
ASBF|0|38
CVGFFHG|166|599
DFSD|0|113
ETECFF|1384|571
FJREER|0|66
TRIFD|38|620679
I have tried using sed, but that does not work. I am not quite good in awk yet, not yet reached to advanced level, so needed help here
Upvotes: 0
Views: 149
Reputation: 10039
another awk
awk -F '|' '
{
Data[ $1, $2] = $3 + 1
if( match( Headers, "(^\||)" $1 "(|\|$)" ) == 0 ) Headers = Headers $1 "|"
if( match( Records, "(^\||)" $2 "(|\|$)" ) == 0 ) Records = Records $2 "|"
}
END {
cHeader = split( Headers, aHeader, "|" )
cRecord = split( Records, aRecord, "|" )
sub( /\|$/, "", Headers
print "|" Headers
for( iRecord = 1; iRecord <= cRecord; iRecord++) {
printf "%s", aRecord[ 1]
for( iHeader = 2; iHeader <= cHeader; iHeader++ ) {
ThisData = Data[ aHeader[ iHeader], aRecord[ iRecord] ]
printf "|%s", --ThisData
}
print
}
}
' YourFile
$3 + 1
and later --ThisData
for forcing the 0Upvotes: 0
Reputation: 12255
Here's a solution with awk. It holds in 2D array values
the sum for
all rows with the same keyword key
and the same header column index i
.
At the END
all these are printed for each key and column.
Array cols
is used to detect a change of header column.
hdrs
is used to keep the headers in the right order for output.
keys
is just used to keep a list of all the keywords.
awk -F'|' '
{ hdr = $1; key = $2; val = $3;
if(cols[hdr]==0){
cols[hdr] = ++column;
hdrs[column] = hdr;
}
i = cols[hdr]
keys[key] = 1
values[i, key] += val
}
END{
for(i = 1;i<=column;i++)
printf "|%s", hdrs[i]
printf "\n"
n = asorti(keys,sort)
for(j = 1;j<=n;j++){
key = sort[j]
printf "%s",key
for(i = 1;i<=column;i++)
printf "|%s", values[i, key]+0
printf "\n"
}
}'
Upvotes: 2
Reputation: 1845
I think in awk you can create an array with index of type string, aka a dictionary with column 1 as key.
Each element of that array should be filled with another array with index string: column 2 as key.
Then process each line, creating the new array elements when necessary, and adding column 3 to the value.
For help with the syntax in awk:
http://www.thegeekstuff.com/2010/03/awk-arrays-explained-with-5-practical-examples/
Look at example 1 in section 5 how easy the final solution can be.
Upvotes: 0