Road King
Road King

Reputation: 147

Using Awk to create a csv line

I am new to awk and can't quite figure out the best way to do this. I have thousands of xml files which I have already removed duplicates and divided fields into a single column in a single file using sed and awk.

Now I want to assemble the list into a csv file containing multiple fields on one line. After a fixed number of fields I want to start a new line.

Example

1234
2345

345678
4.23456E3
54321
654321
789

87654.100
9876

10.0
1234
2345

345678
4.23456E3
54321
654321
789

87654.100
9876

11.0

Output

1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

Thanks

Upvotes: 3

Views: 4797

Answers (4)

potong
potong

Reputation: 58401

This might work for you:

paste -sd',,,,,,,,,,,,\n' file | sed 's/,/, /g'
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

or this (GNU sed):

sed ':a;$bb;N;s/\n/&/12;Ta;:b;s/\n/, /g' file
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

Upvotes: 0

Birei
Birei

Reputation: 36262

One way using sed:

Content of script.sed:

## Label 'a'
:a

## If last line, print what is left in the buffer, substituting
## newlines with commas.
$ {
    s/^\n//
    s/\n/, /g
    p   
    q   
}

## If content of buffer has 12 newlines, we have reached to the limit
## of the line, so remove newlines with commas, print and delete buffer
## overwritting it with 'b'
/\([^\n]*\n\)\{12\}/ {
    s/^\n//
    s/\n/, /g
    p   
    b   
}

## Here buffer has not reached to the limit of fields for each line, so
## append one more (N) and continue loop in label 'a'
N
ba

Run it like:

sed -nf script.sed infile

With following output:

1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

Upvotes: 2

Elton Carvalho
Elton Carvalho

Reputation: 514

If every line would have the same number of fields, say, 5, I would do something like

awk ' { printf("%s",$1); if (NR % 5 == 0) {printf("\n")} else {printf(",")}}' youtfile.txt

NR is the number of lines read by awk and % is the remainder operator. So if the number of lines read is a multiple of 5 (in this case) it will print a line break, otherwise it will print a comma.

This assumes one field per line as in your example and that blank lines in the input will correspond to blank fields in the CSV.

Upvotes: 2

cha0site
cha0site

Reputation: 10717

Is using xargs allowed?

cat input | xargs -L13 -d'\n' | sed -e 's/ /, /g'

I get this output here:

1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 10.0
1234, 2345, , 345678, 4.23456E3, 54321, 654321, 789, , 87654.100, 9876, , 11.0

It's sort of hackey, though, if you started out with XML you should consider using XSLT.

Upvotes: 2

Related Questions