pagid
pagid

Reputation: 13867

Splitting one CSV into multiple files based on field value

I do have a CSV which looks like this*:

system,subject,value1,value2
example.org,thing 1,100,4
exmaple.org,thing 2,90,0
example.com,thing 1,200,0
example.com,thing 5,10,10

The header us actually not included, but shown here to make it easier to read the example.

And I want to split that up into two files:

example.org.csv with:

thing 1,100,4
thing 2,90,0

example.com.csv with:

thing 1,200,0
thing 5,10,10

My current solution works this way:

while read line; do
  SYSTEM=$(echo "$line" | cut -d, -f1)
  NOTTHESYSTEM=$(echo "$line" | cut -d, -f2-)
  echo "${NOTTHESYSTEM}" >> "${SYSTEM}.csv"
done <$INPUT

But this is working very inefficient and doesn't perform well with bigger files.

In numbers this means that a 52050 line/ 9 MB file needs about 250 secounds to finish the split.

Any suggestions how to improve the script above are welcome.

Cheers

Upvotes: 3

Views: 1128

Answers (2)

Sato Katsura
Sato Katsura

Reputation: 3086

One way to do it, assuming you have a reasonably small number of systems:

cut -d, -f1 file.csv | \
    sort -u | \
    while read -r system; do
        fgrep -w "$system" file.csv | cut -d, -f2- >"$system".csv
    done

Upvotes: 1

anubhava
anubhava

Reputation: 784938

Using awk it will be simpler:

awk 'BEGIN{FS=OFS=","} {print $2, $3, $4 > $1 ".csv"}' "$INPUT"

Verification:

cat example.org.csv
thing 1,100,4
thing 2,90,0

cat example.com.csv
thing 1,200,0
thing 5,10,10

Upvotes: 4

Related Questions