Reputation: 149
I have a file with hundreds of thousands records. All these records are unique, comma separated values. First column can be considered the key, and second column is the value of interest.
The file size would be 8 to 10 MBs. I have to lookup these value time to time in a script. Currently I am using below grep statement:
myvalue=$(grep $myvar filename | cut -d, -f2)
It works fine, but the real problem is multiple/sequential lookups to same file. I think it is not very optimized way as I have to lookup from same file multiple times (more than 100-200 times ) during my script run so each time it'll grep the entire file. I want some better/optimized way.
update Important to note that it is sequential script, and all the values in $myvar are generated during runtime, so I can't have all the values available and do a combined lookup, it has to be one value lookup in each iteration
Upvotes: 2
Views: 2621
Reputation: 15293
If the file is constructed once and then referenced over and over without being changed in between, you need to use an associative array as a lookup table. That might get big and ugly in bash; consider perl instead.
However, you asked how to do it in bash.
$: eval "declare -A lookup=(
$( sed -E 's/^([^,]+),([^,]+).*/ [\1]=\2/' filename )
)"
Now all the values should be in the table lookup
.
An associative array uses strings as its keys instead of integers, so this sets the keys and values as pairs in a table.
sed -E 's/^([^,]+),([^,]+).*/ [\1]=\2/'
takes the first and second fields of the comma-delimited file and reformats them into key/value assignments in bash syntax, like this:
declare -A lookup=(
[a]=1
[b]=2
[c]=3 # ... and so on
)
The eval
parses all that into the current environment for your use.
No more grep
's. Just use "${lookup[$myvar]}"
.
If you just wanted to assign it for readability, then instead of the grep
use
myvalue="${lookup[$myvar]}"
My local example in use:
$: cat x
a,1,lijhgf
b,2,;lsaoidj
c,3,;l'skd
$: echo "declare -A lookup=(
$( sed -E 's/^([^,]+),([^,]+).*/ [\1]=\2/' x )
)"
declare -A lookup=(
[a]=1
[b]=2
[c]=3
)
$: eval "declare -A lookup=(
$( sed -E 's/^([^,]+),([^,]+),.*/ [\1]=\2/' x )
)"
$: echo "${lookup[b]}"
2
Upvotes: 3
Reputation: 26481
First of all, lets look at your command:
myvalue=$(grep $myvar filename | cut -d, -f2)
You make use of 2 binaries which you load (grep
and cut
) to process the data. You should attempt to reduce this to a single binary. This will already help a lot:
myvalue=$(awk -F, -v var="$myvar" '$0~var { print $2; exit}' filename)
This will be much faster as:
If you need to do multiple lookups based on the key which is located in the first column, you can do the following in bash:
while IFS= read -r; do
declare -A z+="( $REPLY )"
done < <(awk -F, '{print "["$1"]="$0}' lookupfile)
echo ${z[$key]}
based on How do I populate a bash associative array with command output?
Upvotes: 2
Reputation: 17491
On of the obvious things I'm thinking of is the limitation of grep
results, which can be done with the -m
switch:
Prompt>cat test.txt
a
a
b
a
b
Prompt>grep "a" test.txt
a
a
a
Prompt>grep -m 1 "a" test.txt
a
Upvotes: 2