Reputation: 23

Replacing multiple character using awk with , comma for csv

Dears,

I have the following list which some time contain 10000 entries:

listID.14.1 = STRING: test1
listID.14.2 = STRING: test2
listID.14.3 = STRING: test3
listID.14.4 = STRING: test4
listID.14.5 = STRING: test5 
listID.14.6 = STRING: test6
listID.14.7 = STRING: test7

I want the output to be like this

test1,test2,test3,...,test7

I used the following code which will be more accurate better than new line cause some list contain new line

awk -F "listID.${listID}.([0-9]+|[0-9]{3}|[0-9]{1,5}) = STRING: " '{print ","$2}'

but the output happened to be:

,test1
,test2
,test3
,test4

which is wrong not like this "test1,test2,test3, ...., testn" as I want I'm not sure how to modify my code to result as I shown above, Some idea I'm thinking of adding new line at the beginning but the code it didn't work for me I think I used wrong format. Need some help

awk -F "\nlistID.${listID}.([0-9]+|[0-9]{3}|[0-9]{1,5}) = STRING: " '{print ","$2}'

Need some help Also second question about this code:

awk -F "listID.${listID}.([0-9]+|[0-9]{3}|[0-9]{1,5}) = STRING: " '{print ","$2}'

does this specific format ([0-9]+|[0-9]{3}|[0-9]{1,5}) check for number between 1 to 10000

Upvotes: 2

Answers (4)

Jotne

Reputation: 41460

This should do:

awk '{printf "%s,",$NF} END {print ""}' file
test1,test2,test3,test4,test5,test6,test7,

If you do not like the extra comma at the end:

awk '{printf (NR==1?"":",")"%s",$NF} END {print ""}' file
test1,test2,test3,test4,test5,test6,test7

Upvotes: 3

tshiono

Reputation: 22087

If Perl is your option, please try:

perl -lane 'push(@ary, pop(@F)); END {print join(",", @ary)};' list.txt

-l option automatically removes a record separator of input lines and add it back to the output lines.
-a option enables the auto-splitting mode on blank characters as AWK does and assigns the fields to array @F.
pop(@F) returns the last element of @F to add to @ary.
-n option makes perl iterate over input records as AWK does.

BTW answering your 2nd question, the regex /^([1-9][0-9]{0,3}|10000)$/ will match the numbers from 1 to 10000.

So your last line will be someting like:

awk -F "listID\.${listID}\.([1-9][0-9]{0,3}|10000) = STRING: " '{printf ",%s", $2}'

although setting FS to the complex string as above may not be a good idea. It will not work as you expect because it does not skip the lines which do not match the regex.

Hope this helps.

Upvotes: 0

KamilCuk

Reputation: 141900

Extract the text after last space and print it separated with a comma:

 cut -d' ' -f4 | paste -sd,

Tested with:

cat <<EOF |
listID.14.1 = STRING: test1
listID.14.2 = STRING: test2
listID.14.3 = STRING: test3
listID.14.4 = STRING: test4
listID.14.5 = STRING: test5 
listID.14.6 = STRING: test6
listID.14.7 = STRING: test7 
EOF
cut -d' ' -f4 | paste -sd,

outputs:

test1,test2,test3,test4,test5,test6,test7

Upvotes: 4

geckos

Reputation: 6299

You can do something like this

awk '{ a = a","$4 } END {print a }' < foo

Where foo is the file containing your data, this will left a leading comma

,test1,test2,test3,test4,test5,test6,test7

You can remove it with sed | sed 's/^,//'.

Upvotes: 2

Replacing multiple character using awk with , comma for csv

Answers (4)

Related Questions