Ibraheem
Ibraheem

Reputation: 149

Best/optimized way to search/lookup text in a file in BASH scripts

I have a file with hundreds of thousands records. All these records are unique, comma separated values. First column can be considered the key, and second column is the value of interest.

The file size would be 8 to 10 MBs. I have to lookup these value time to time in a script. Currently I am using below grep statement:

myvalue=$(grep $myvar filename | cut -d, -f2)

It works fine, but the real problem is multiple/sequential lookups to same file. I think it is not very optimized way as I have to lookup from same file multiple times (more than 100-200 times ) during my script run so each time it'll grep the entire file. I want some better/optimized way.

update Important to note that it is sequential script, and all the values in $myvar are generated during runtime, so I can't have all the values available and do a combined lookup, it has to be one value lookup in each iteration

Upvotes: 2

Views: 2621

Answers (3)

Paul Hodges
Paul Hodges

Reputation: 15293

If the file is constructed once and then referenced over and over without being changed in between, you need to use an associative array as a lookup table. That might get big and ugly in bash; consider perl instead.

However, you asked how to do it in bash.

$: eval "declare -A lookup=(
   $( sed -E 's/^([^,]+),([^,]+).*/  [\1]=\2/' filename )
   )" 

Now all the values should be in the table lookup.

An associative array uses strings as its keys instead of integers, so this sets the keys and values as pairs in a table.

sed -E 's/^([^,]+),([^,]+).*/  [\1]=\2/'

takes the first and second fields of the comma-delimited file and reformats them into key/value assignments in bash syntax, like this:

declare -A lookup=(
   [a]=1
   [b]=2
   [c]=3 # ... and so on
) 

The eval parses all that into the current environment for your use.

No more grep's. Just use "${lookup[$myvar]}".
If you just wanted to assign it for readability, then instead of the grep use

myvalue="${lookup[$myvar]}"

My local example in use:

$: cat x
a,1,lijhgf
b,2,;lsaoidj
c,3,;l'skd

$: echo "declare -A lookup=(
   $( sed -E 's/^([^,]+),([^,]+).*/  [\1]=\2/' x )
   )"
   declare -A lookup=(
     [a]=1
     [b]=2
     [c]=3
   )

$: eval "declare -A lookup=(
   $( sed -E 's/^([^,]+),([^,]+),.*/  [\1]=\2/' x )
   )"

$: echo "${lookup[b]}"
   2

Upvotes: 3

kvantour
kvantour

Reputation: 26481

First of all, lets look at your command:

myvalue=$(grep $myvar filename | cut -d, -f2)

You make use of 2 binaries which you load (grep and cut) to process the data. You should attempt to reduce this to a single binary. This will already help a lot:

myvalue=$(awk -F, -v var="$myvar" '$0~var { print $2; exit}' filename)

This will be much faster as:

  • it is a single library
  • stops reading the file from the moment the entry is found

If you need to do multiple lookups based on the key which is located in the first column, you can do the following in bash:

 while IFS= read -r; do
    declare -A z+="( $REPLY )"
 done < <(awk -F, '{print "["$1"]="$0}' lookupfile)

 echo ${z[$key]}

based on How do I populate a bash associative array with command output?

Upvotes: 2

Dominique
Dominique

Reputation: 17491

On of the obvious things I'm thinking of is the limitation of grep results, which can be done with the -m switch:

Prompt>cat test.txt
a
a
b
a
b

Prompt>grep "a" test.txt
a
a
a

Prompt>grep -m 1 "a" test.txt
a

Upvotes: 2

Related Questions