exAres
exAres

Reputation: 4926

How to process large csv files efficiently using shell script, to get better performance than that for following script?

I have a large csv file input_file with 5 columns. I want to do two things to second column:

(1) Remove last character (2) Append leading and trailing single quote

Following are the sample rows from input_file.dat

420374,2014-04-06T18:44:58.314Z,214537888,12462,1
420374,2014-04-06T18:44:58.325Z,214537850,10471,1
281626,2014-04-06T09:40:13.032Z,214535653,1883,1

Sample output would look like :

420374,'2014-04-06T18:44:58.314',214537888,12462,1
420374,'2014-04-06T18:44:58.325',214537850,10471,1
281626,'2014-04-06T09:40:13.032',214535653,1883,1

I have written a following code to do the same.

#!/bin/sh
inputfilename=input_file.dat
outputfilename=output_file.dat
count=1

while read line
do
  echo $count
  count=$((count + 1))
  v1=$(echo $line | cut -d ',' -f1)
  v2=$(echo $line | cut -d ',' -f2)
  v3=$(echo $line | cut -d ',' -f3)
  v4=$(echo $line | cut -d ',' -f4)
  v5=$(echo $line | cut -d ',' -f5)
  v2len=${#v2}
  v2len=$((v2len -1))
  newv2=${v2:0:$v2len}
  newv2="'$newv2'"
  row=$v1,$newv2,$v3,$v4,$v5
  echo $row >> $outputfilename
done < $inputfilename

But it's taking lot of time.

Is there any efficient way to achieve this?

Upvotes: 2

Views: 925

Answers (1)

henfiber
henfiber

Reputation: 1307

You can do this with awk

awk -v q="'" 'BEGIN{FS=OFS=","} {$2=q substr($2,1,length($2)-1) q}1' input_file.dat

How it works:

  • BEGIN{FS=OFS=","} : set input and output field separator (FS, OFS) to ,.
  • -v q="'" : assign a literal single quote to the variable q (to avoid complex escaping in the awk expression)
  • {$2=q substr($2,1,length($2)-1) q} : Replace the second field ($2) with a single quote (q) followed by the value of the 2nd field without the last character (substr(string, start, length)) and appending a literal single quote (q) at the end.
  • 1 : Just invoke the default action, which is print the current (edited) line.

Upvotes: 2

Related Questions