Erik
Erik

Reputation: 121

Removing columns from a csv file with different numbers of columns per line

I have this bash script to remove columns from lines of a given csv file, but it runs very slowly. I need to use this script for files larger than 1GB, so I'm looking for a faster solution.

#!/bin/bash

while read line; do
    columns=`echo $line | awk '{print NF}' FS=,`
    if [ "$columns" == "9" ]; then
            echo `echo $line | cut -d \, -f 1,5,6,8,9`
    elif [ "$columns" == "24" ]; then
            echo `echo $line | cut -d \, -f 1,5,6,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24`
    elif [ "$columns" == "8" ]; then
            echo `echo $line | cut -d \, -f 1,4,5,6,7,8`
    else
            echo $line
    fi
done <$1

If anyone has advice on how to speed this up or if theres a better way to do it, that'd be awesome. Thanks a lot!

Upvotes: 1

Views: 102

Answers (1)

anubhava
anubhava

Reputation: 784958

Your entire script can be handled by a single awk.

Try this:

awk 'BEGIN{FS=OFS=","}
     NF==9 {print $1, $5, $6, $8, $9; next}
     NF==8 {print $1, $4, $5, $6, $8; next}
NF==24{print $1,$4,$5,$6,$8,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24} "$1"

Upvotes: 1

Related Questions