Reputation: 163
I'm trying to join similar groups of lines in to a single line. My file is a basic log-type file, but each entry spans three lines followed by a newline. Example:
Timestamp
key1 | val1 | key2 | val2
key3 | val3 | key4 | val4
Timestamp
key1 | val1 | key2 | val2
key3 | val3 | key4 | val4
What I would like is for each block of 3 lines to be on a single, comma-separated line:
Timestamp,key1,val1,key2,val2,key3,val3,key4,val4
I could do this with sed&awk if I only had to deal with the key/value lines, but my problem is with getting the timestamp on each line.
Things I've looked at are using xargs and paste but neither seemed to do what I needed them to do.
Upvotes: 1
Views: 3120
Reputation: 58420
This might work for you (GNU sed):
sed -n 'N;N;s/ *[|\n] */,/pg;n' file
Read 3 lines into the pattern space, replace pipe or newline characters (possibly surrounded by spaces) with commas, print the successful substitution and throwaway the empty lines.
Upvotes: 2
Reputation: 5655
Some stupid sed-only tricks:
sed -n -e '/Timestamp/{h;n};s/ | /,/g;H;/^$/{g;s/\n/,/g;s/,$//;p}' file
sed -n
to print only when the p
command is used/Timestamp/{h;n};
replace the hold space the Timestamp line, and move onto the next line of inputs/ | /,/g;H;
replace bars with commas and append to the hold space/^$/{g;s/\n/,/g;s/,$//;p}
on blank lines get the contents of the hold space into the pattern space, s/\n/,/g
replace newlines with commas, and finally s/,$//;p
remove the trailing comma and print the pattern spaceInput file
:
Timestampa
key1 | val1 | key2 | val2
key3 | val3 | key4 | val4
Timestampb
key1 | val1 | key2 | val2
key3 | val3 | key4 | val4
Output:
Timestampa,key1,val1,key2,val2,key3,val3,key4,val4
Timestampb,key1,val1,key2,val2,key3,val3,key4,val4
s/\n/,/g
may be system / sed version dependent.
Upvotes: 0
Reputation: 67497
alternative solution with sed
and paste
$ sed 's/ *| */,/g;/^$/d' file | paste -d, - - -
Timestamp,key1,val1,key2,val2,key3,val3,key4,val4
Timestamp,key1,val1,key2,val2,key3,val3,key4,val4
reads like: replace delimiter with comma, delete empty lines, paste 3 lines at a time with comma separator in between.
Upvotes: 4
Reputation: 536
This awk makes use of the builtin RS variable to simplify moving between records. We detect if we are on a timestamp line and set the ts
variable if we are. Then since we set RS $1
through $NF
will be our key, value fields, so iterate through them and append them to an output string. We save the last one for outside the loop so we can avoid a dangling ,
. Then we just print the row and move on.
BEGIN{
RS="\n\n"; # Everything between blank lines will be treated as one record
FS="|"; # Our fields are separated with pipes.
}
{
if( NF == 1 ){ # The number of fields on this line is 1... only our timestamp lines look like this.
ts=$1;
next; # Go to next record.
};
# Build up an output buffer while avoiding dangling ","
out="";
for( i=1; i < NF; i++ ){
out=out$i","
}
out=out$NF;
print ts","out
}
Upvotes: 1
Reputation: 203522
$ awk -v RS= -F'\n| \\| ' -v OFS=',' '{$1=$1}1' file
Timestamp,key1,val1,key2,val2,key3,val3,key4,val4
Timestamp,key1,val1,key2,val2,key3,val3,key4,val4
Upvotes: 4
Reputation: 133518
try:
awk '/^Timestamp/ && VAL{print VAL;VAL=$0;next} {gsub(/ +\| +/,",");VAL=VAL?VAL OFS $0:$0} END{print VAL}' OFS="," Input_file
Looking for string Timestamp and VAL if they both have values then printing the value of variable VAL and then assigning the VAL to current line and mentioning the next to skip all further statements. Then if this condition is not satisfy then globally substituting the space | space with a comma, then making a variable named VAL whose value will be concatenating with it's own value each time. Then in END section also printing the value of VAL because VAL could be present there.
Upvotes: 0