vmos
vmos

Reputation: 139

How do I get awk to NOT use space as a delimeter?

I've got a CSV that I'm trying to process, but some of my fields contain commas, line breaks and spaces and now that I think about it, there's probably some apostrophes in there too.

For the commas and line breaks, I've converted them to other strings at the output phase and convert them back at the end (yes it's messy but I only need to run this once) I realise that I may have to do this with the spaces too but I've broken the problem down to it's basic parts to see if I can work around it

Here's an input.csv

"john","beatles.com","arse","[email protected]","1","1","on holiday"
"paul","beatles.com","bung","","0","1","also on holiday"

(I've tried with and without quotes)

here's the script

INPUT="input.csv"

for i in `cat ${INPUT}`

do
#USERNAME=`echo $i | awk -v  FS=',' '{print $1}'`
USERNAME=`echo $i | awk 'BEGIN{FS="[|,:]"} ; {print $1}'`
echo "username: $USERNAME"

done

So that should just input john and paul but instead I get

username: "john"
username: holiday"
username: "paul"
username: on
username: holiday"

because it sees the spaces and interprets them as new rows.

Can I get it to stop that?

Upvotes: 5

Views: 14907

Answers (4)

pobrelkey
pobrelkey

Reputation: 5973

An awk-free solution:

cut -d, -f1 input.csv | while read -r USERNAME ; do echo "username: ${USERNAME}" ; done

The above assumes you want to keep the quotes. If not...

cut -d, -f1 input.csv | sed 's,^",,;s,"$,,' | while read -r USERNAME ; do echo "username: ${USERNAME}" ; done

Both of the above also assume there are no commas in your field contents. If that's not true, use a "proper" CSV parser in your favorite scripting language. Example...

ruby -rcsv -ne 'puts CSV.parse_line($_)[0]' input.csv | while read -r USERNAME ; do echo "username: ${USERNAME}" ; done

Upvotes: 0

anubhava
anubhava

Reputation: 785098

You can use any regex field separator in awk, eg using optional comma followed by double quote:

awk -F ',?"' '{print $2, $4, $6, $8, $10, $12, "<" $14 ">"}' f1
john beatles.com arse [email protected] 1 1 <on holiday>
paul beatles.com bung  0 1 <also on holiday>

Enclose last field $14 n < and > to showcase how it gets in a single awk variable.

Upvotes: 2

devnull
devnull

Reputation: 123458

It's not awk, but the shell (the default value of IFS) that's causing word splitting.

You could fix that by saying:

while read -r i; do
  USERNAME=$(echo "$i" | awk 'BEGIN{FS="[|,:]"} ; {print $1}');
  echo "username: $USERNAME";
done < $INPUT

In order to verify how the shell is reading the input, add

echo "This is a line: ${i}"

in the loop.

Upvotes: 3

Timothy Brown
Timothy Brown

Reputation: 2280

A few things to note, you don't need to use cat or a for loop. Unless I am missing the bigger picture...

What happens when you call awk on the file?

awk -F"," '{print $1}' input.csv

I get the following:

$ awk -F"," '{print $1}' input.csv
"john"
"paul"
$

Upvotes: 3

Related Questions