Reputation: 2502
I would like to transpose a list of of items (key/value pairs) into a table format. The solution can be a bash script, awk, sed, or some other method.
Suppose I have a long list, such as this:
date and time: 2013-02-21 18:18 PM
file size: 1283483 bytes
key1: value
key2: value
date and time: 2013-02-21 18:19 PM
file size: 1283493 bytes
key2: value
...
I would like to transpose into a table format with tab or some other separator to look like this:
date and time file size key1 key2
2013-02-21 18:18 PM 1283483 bytes value value
2013-02-21 18:19 PM 1283493 bytes value
...
or like this:
date and time|file size|key1|key2
2013-02-21 18:18 PM|1283483 bytes|value|value
2013-02-21 18:19 PM|1283493 bytes||value
...
I have looked at solutions such as this An efficient way to transpose a file in Bash, but it seems like I have a different case here. The awk solution works partially for me, it keeps outputting all the rows into a long list of columns, but I need for the columns to be constrained to a unique list.
awk -F': ' '
{
for (i=1; i<=NF; i++) {
a[NR,i] = $i
}
}
NF>p { p = NF }
END {
for(j=1; j<=p; j++) {
str=a[1,j]
for(i=2; i<=NR; i++){
str=str" "a[i,j];
}
print str
}
}' filename
UPDATE
Thanks to all of you who providing your solutions. Some of them look very promising, but I think my version of the tools might be outdated and I am getting some syntax errors. What I am seeing now is that I did not start off with very clear requirements. Kudos to sputnick for being the first one to offer the solution before I spelled out the full requirements. I have had a long day when I wrote the question and thus it was not very clear.
My goal is to come up with a very generic solution for parsing multiple lists of items into column format. I am thinking the solution does not need to support more than 255 columns. Column names are not going to be known ahead of time, this way the solution will work for anyone, not just me. The two known things are the separator between kev/value pairs (": ") and a separator between lists (empty line). It would be nice to have a variable for those, so that they are configurable for others to reuse this.
From looking at proposed solutions, I realize that a good approach is to do two passes over the input file. First pass is to gather all the column names, optionally sort them, then print the header. Second to grab the values of the columns and print them.
Upvotes: 4
Views: 5117
Reputation: 329
example:
> ls -aFd * | xargs -L 5 echo | column -t
bras.tcl@ Bras.tpkg/ CctCc.tcl@ Cct.cfg consider.tcl@
cvsknown.tcl@ docs/ evalCmds.tcl@ export/ exported.tcl@
IBras.tcl@ lastMinuteRule.tcl@ main.tcl@ Makefile Makefile.am
Makefile.in makeRule.tcl@ predicates.tcl@ project.cct sourceDeps.tcl@
tclIndex
Upvotes: 1
Reputation: 97938
This does not make any assumptions on the column structure so it does not try to order them, however, all fields are printed in the same order for all records:
use strict;
use warnings;
my (@db, %f, %fields);
my $counter = 1;
while (<>) {
my ($field, $value) = (/([^:]*):\s*(.*)\s*$/);
if (not defined $field) {
push @db, { %f };
%f = ();
} else {
$f{$field} = $value;
$fields{$field} = $counter++ if not defined $fields{$field};
}
}
push @db, \%f;
#my @fields = sort keys %fields; # alphabetical order
my @fields = sort {$fields{$a} cmp $fields{$b} } keys %fields; #first seen order
# print header
print join("|", @fields), "\n";
# print rows
for my $row (@db) {
print join("|", map { $row->{$_} ? $row->{$_} : "" } @fields), "\n";
}
Upvotes: 1
Reputation: 30580
Here's a pure awk solution:
# split lines on ": " and use "|" for output field separator
BEGIN { FS = ": "; i = 0; h = 0; ofs = "|" }
# empty line - increment item count and skip it
/^\s*$/ { i++ ; next }
# normal line - add the item to the object and the header to the header list
# and keep track of first seen order of headers
{
current[i, $1] = $2
if (!($1 in headers)) {headers_ordered[h++] = $1}
headers[$1]
}
END {
h--
# print headers
for (k = 0; k <= h; k++)
{
printf "%s", headers_ordered[k]
if (k != h) {printf "%s", ofs}
}
print ""
# print the items for each object
for (j = 0; j <= i; j++)
{
for (k = 0; k <= h; k++)
{
printf "%s", current[j, headers_ordered[k]]
if (k != h) {printf "%s", ofs}
}
print ""
}
}
Example input (note that there should be a newline after the last item):
foo: bar
foo2: bar2
foo1: bar
foo: bar3
foo3: bar3
foo2: bar3
Example output:
foo|foo2|foo1|foo3
bar|bar2|bar|
bar3|bar3||bar3
Note: you will probably need to alter this if your data has ": " embedded in it.
Upvotes: 1
Reputation: 54392
Here's one way using GNU awk
. Run like:
awk -f script.awk file
Contents of script.awk
:
BEGIN {
# change this to OFS="\t" for tab delimited ouput
OFS="|"
# treat each record as a set of lines
RS=""
FS="\n"
}
{
# keep a count of the records
++i
# loop through each line in the record
for (j=1;j<=NF;j++) {
# split each line in two
split($j,a,": ")
# just holders for the first two lines in the record
if (j==1) { date = a[1] }
if (j==2) { size = a[1] }
# keep a tally of the unique key names
if (j>=3) { !x[a[1]] }
# the data in a multidimensional array:
# record number . key = value
b[i][a[1]]=a[2]
}
}
END {
# sort the unique keys
m = asorti(x,y)
# add the two strings to a numerically indexed array
c[1] = date
c[2] = size
# set a variable to continue from
f=2
# loop through the sorted array of unique keys
for (j=1;j<=m;j++) {
# build the header line from the file by adding the sorted keys
r = (r ? r : date OFS size) OFS y[j]
# continue to add the sorted keys to the numerically indexed array
c[++f] = y[j]
}
# print the header and empty
print r
r = ""
# loop through the records ('i' is the number of records)
for (j=1;j<=i;j++) {
# loop through the subrecords ('f' is the number of unique keys)
for (k=1;k<=f;k++) {
# build the output line
r = (r ? r OFS : "") b[j][c[k]]
}
# and print and empty it ready for the next record
print r
r = ""
}
}
Here's the contents of a test file, called file
:
date and time: 2013-02-21 18:18 PM
file size: 1283483 bytes
key1: value1
key2: value2
date and time: 2013-02-21 18:19 PM
file size: 1283493 bytes
key2: value2
key1: value1
key3: value3
date and time: 2013-02-21 18:20 PM
file size: 1283494 bytes
key3: value3
key4: value4
date and time: 2013-02-21 18:21 PM
file size: 1283495 bytes
key5: value5
key6: value6
Results:
2013-02-21 18:18 PM|1283483 bytes|value1|value2||||
2013-02-21 18:19 PM|1283493 bytes|value1|value2|value3|||
2013-02-21 18:20 PM|1283494 bytes|||value3|value4||
2013-02-21 18:21 PM|1283495 bytes|||||value5|value6
Upvotes: 2
Reputation: 185025
use strict; use warnings;
# read the file paragraph by paragraph
$/ = "\n\n";
print "date and time|file size|key1|key2\n";
# parsing the whole file with the magic diamond operator
while (<>) {
if (/^date and time:\s+(.*)/m) {
print "$1|";
}
if (/^file size:(.*)/m) {
print "$1|";
}
if (/^key1:(.*)/m) {
print "$1|";
}
else {
print "|";
}
if (/^key2:(.*)/m) {
print "$1\n";
}
else {
print "\n";
}
}
perl script.pl file
date and time|file size|key1|key2
2013-02-21 18:18 PM| 1283483 bytes| value| value
2013-02-21 18:19 PM| 1283493 bytes|| value
Upvotes: 0