Reputation: 857
I'm trying to speed up my script, which currently takes approx 30 seconds. I am a novice to bash, and I am sure I am using some bad scripting practice (found some hint in https://unix.stackexchange.com/a/169765 but still cannot fix my issue).
What I need to do is get data from an external file, and extract numbers into two arrays. My script works fine, except it is too slow.
readData=`cat $myfile`
# readData = [[1491476100000,60204],[1491476130000,59734],...,[1491476160000,60150]]
# I have approximately 5000 points (two numbers in each point)
pointTime=()
pointVal=()
for line in `echo $readData | grep -Po "[0-9]+,[0-9]+"`; do
# Get first number but drop last three zeroes (e.g. 1491476100)
pointTime+=(`echo $line | grep -Po "^[0-9]+" | sed "s/\(.*\)000$/\1/"`)
# Get second number, e.g. 60204
pointVal+=(`echo $line | grep -Po "[0-9]+$"`)
done
Maybe I could use some regex inside a parameter expansion, but I don't know how.
Upvotes: 1
Views: 503
Reputation: 189948
I'm suspicious of the requirement to store the results in an array. You probably actually want to loop over the values in pairs. In any event, storing intermediate values in memory is inelegant and wasteful.
grep -Eo '[0-9]+,[0-9]+' "$myfile" |
while IFS=, read -r first second, do
process value pair "${first%000}" "$second"
done
If you insist on storing the values in an array, how to change the body of the loop should be obvious.
pointTime+=("${first%000}")
pointVal+=("$second")
Upvotes: 2
Reputation: 27370
Here's how I would write the script:
mapfile -t points < <(grep -Po '\d+,\d+' "$myfile")
pointTime=("${points[@]%000,*}")
pointVal=("${points[@]#*,}")
or even
mapfile -t pointTime < <(grep -Po '\d+(?=000,)' "$myfile")
mapfile -t pointVal < <(grep -Po ',\K\d+' "$myfile")
when you are sure that the file is well-formed.
You already identified the main problem: The loop is slow, especially, since a lot of programs are called inside the loop. Nevertheless here are some hints how you could have improved your script without throwing away the loop. Some parts were needlessly complicated, for instance
readData=`cat $myfile`
`echo $readData | grep -Po "[0-9]+,[0-9]+"`
can be written as
grep -Po "[0-9]+,[0-9]+" "$myfile"
and
echo $line | grep -Po "^[0-9]+" | sed "s/\(.*\)000$/\1/"
can be written as
grep -Po "^[0-9]+(?=000)" <<< "$line"
A big speed boost would be to use bash's matching operator =~
instead of grep
, because just starting up grep
is slow.
[[ "$line" =~ (.*)000,(.*) ]]
pointTime+=("${BASH_REMATCH[1]}")
pointTime+=("${BASH_REMATCH[2]}")
Upvotes: 2