Reputation: 139
I've got a CSV that I'm trying to process, but some of my fields contain commas, line breaks and spaces and now that I think about it, there's probably some apostrophes in there too.
For the commas and line breaks, I've converted them to other strings at the output phase and convert them back at the end (yes it's messy but I only need to run this once) I realise that I may have to do this with the spaces too but I've broken the problem down to it's basic parts to see if I can work around it
Here's an input.csv
"john","beatles.com","arse","[email protected]","1","1","on holiday"
"paul","beatles.com","bung","","0","1","also on holiday"
(I've tried with and without quotes)
here's the script
INPUT="input.csv"
for i in `cat ${INPUT}`
do
#USERNAME=`echo $i | awk -v FS=',' '{print $1}'`
USERNAME=`echo $i | awk 'BEGIN{FS="[|,:]"} ; {print $1}'`
echo "username: $USERNAME"
done
So that should just input john and paul but instead I get
username: "john"
username: holiday"
username: "paul"
username: on
username: holiday"
because it sees the spaces and interprets them as new rows.
Can I get it to stop that?
Upvotes: 5
Views: 14907
Reputation: 5973
An awk-free solution:
cut -d, -f1 input.csv | while read -r USERNAME ; do echo "username: ${USERNAME}" ; done
The above assumes you want to keep the quotes. If not...
cut -d, -f1 input.csv | sed 's,^",,;s,"$,,' | while read -r USERNAME ; do echo "username: ${USERNAME}" ; done
Both of the above also assume there are no commas in your field contents. If that's not true, use a "proper" CSV parser in your favorite scripting language. Example...
ruby -rcsv -ne 'puts CSV.parse_line($_)[0]' input.csv | while read -r USERNAME ; do echo "username: ${USERNAME}" ; done
Upvotes: 0
Reputation: 785098
You can use any regex field separator in awk, eg using optional comma followed by double quote:
awk -F ',?"' '{print $2, $4, $6, $8, $10, $12, "<" $14 ">"}' f1
john beatles.com arse [email protected] 1 1 <on holiday>
paul beatles.com bung 0 1 <also on holiday>
Enclose last field $14
n < and >
to showcase how it gets in a single awk variable.
Upvotes: 2
Reputation: 123458
It's not awk
, but the shell (the default value of IFS
) that's causing word splitting.
You could fix that by saying:
while read -r i; do
USERNAME=$(echo "$i" | awk 'BEGIN{FS="[|,:]"} ; {print $1}');
echo "username: $USERNAME";
done < $INPUT
In order to verify how the shell is reading the input, add
echo "This is a line: ${i}"
in the loop.
Upvotes: 3
Reputation: 2280
A few things to note, you don't need to use cat
or a for
loop. Unless I am missing the bigger picture...
What happens when you call awk on the file?
awk -F"," '{print $1}' input.csv
I get the following:
$ awk -F"," '{print $1}' input.csv
"john"
"paul"
$
Upvotes: 3