Reputation: 147
I am new to awk and can't quite figure out the best way to do this. I have thousands of xml files which I have already removed duplicates and divided fields into a single column in a single file using sed and awk.
Now I want to assemble the list into a csv file containing multiple fields on one line. After a fixed number of fields I want to start a new line.
Example
1234
2345
345678
4.23456E3
54321
654321
789
87654.100
9876
10.0
1234
2345
345678
4.23456E3
54321
654321
789
87654.100
9876
11.0
Output
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0
Thanks
Upvotes: 3
Views: 4797
Reputation: 58401
This might work for you:
paste -sd',,,,,,,,,,,,\n' file | sed 's/,/, /g'
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0
or this (GNU sed):
sed ':a;$bb;N;s/\n/&/12;Ta;:b;s/\n/, /g' file
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0
Upvotes: 0
Reputation: 36262
One way using sed
:
Content of script.sed
:
## Label 'a'
:a
## If last line, print what is left in the buffer, substituting
## newlines with commas.
$ {
s/^\n//
s/\n/, /g
p
q
}
## If content of buffer has 12 newlines, we have reached to the limit
## of the line, so remove newlines with commas, print and delete buffer
## overwritting it with 'b'
/\([^\n]*\n\)\{12\}/ {
s/^\n//
s/\n/, /g
p
b
}
## Here buffer has not reached to the limit of fields for each line, so
## append one more (N) and continue loop in label 'a'
N
ba
Run it like:
sed -nf script.sed infile
With following output:
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0
Upvotes: 2
Reputation: 514
If every line would have the same number of fields, say, 5, I would do something like
awk ' { printf("%s",$1); if (NR % 5 == 0) {printf("\n")} else {printf(",")}}' youtfile.txt
NR is the number of lines read by awk and % is the remainder operator. So if the number of lines read is a multiple of 5 (in this case) it will print a line break, otherwise it will print a comma.
This assumes one field per line as in your example and that blank lines in the input will correspond to blank fields in the CSV.
Upvotes: 2
Reputation: 10717
Is using xargs
allowed?
cat input | xargs -L13 -d'\n' | sed -e 's/ /, /g'
I get this output here:
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0
It's sort of hackey, though, if you started out with XML you should consider using XSLT.
Upvotes: 2