Programmer
Programmer

Reputation: 439

Shell script (running sed in loop) not completing with large files

Given below is my code for getting the converted data into a new file.

cat Report.csv > /tmp/report.part

while read line
do
comp1=$(echo $line | awk -F, '{print $1}')
timestamp=$(echo $line | awk -F, '{print $1}')
converted=$(ssboetod "$timestamp")
sed -i "s/$timestamp/$converted/g" Report.csv
done < /tmp/report.part

My input file has data as given below:

1424412109,ABC
1424407352,XYZ
1424424533,DEF

Expected output is:

Fri Feb 20 11:31:49 2015,ABC
Fri Feb 20 10:12:32 2015,XYZ 
Fri Feb 20 14:58:53 2015,DEF 

Looking at the above code and the files, I think we are clear on what is required. I just want to convert the long format date into the date format. The code is working all fine. If I have small number of rows then there is no issue at all. I am currently working with a large file, which has 150,000 records. The code is stuck and doesn't exit at all. Can anyone please help me out with what I have missed here.

Upvotes: 0

Views: 640

Answers (3)

Programmer
Programmer

Reputation: 439

Thanks for all your help :)

I changed my while loop as suggested by tripleee. Given below is the code, which is working perfectly fine and delay also in seconds.

cut -d, -f1 Report.csv |
sort -u |
while read timestamp; do
    converted=$(ssboetod "$timestamp")
    echo "s/$timestamp/$converted/"
done |
sed -i -f - Report.csv

Upvotes: 0

declension
declension

Reputation: 4185

You could approach it a slightly simpler (and much faster) way, by only modifying the file once, but using multiple sed replacements:

#! /bin/bash
infile='Report.csv'

while read line
do
    timestamp=$(echo "$line" | awk -F, '{print $1}')
    converted=$(ssboetod "$timestamp")
    script="s/$timestamp/$converted/g; $script"
done < "$infile"

cp "$infile" .backup.csv
sed -i -e "$script" "$infile"

I had to guess what your ssboetod did, so for testing I used:

converted=$(date +'%a %m %d %H:%M:%S %Y' -d @$timestamp)

which works near enough (bar timezones, maybe).

Upvotes: 1

tripleee
tripleee

Reputation: 189297

This looks suspiciously similar to an earlier question of yours but if we assume that the report contains multiple time stamps and you want to convert all of them, maybe try

cut -d, -f1 Report.csv |
sort -u |
while read timestamp; do
    converted=$(ssboetod "$timestamp")
    echo "s/$timestamp/$converted/"
done |
sed -i -f - Report.csv

... assuming your sed can tolerate the -f - to read a script from standard input (not all variants can do that, but Linux should be fine).

By opening, reading, and writing back Report.csv from start to end only once (plus another read through to read the timestamps), this should be massively faster than your script, which rewrites the entire file once for every line in the file, sometimes needlessly.

Upvotes: 3

Related Questions