jessica
jessica

Reputation: 2610

Don't match regex when variable is empty

I have the following file

more /etc/hosts
23.1.22.162 kafka01.dfg.com
23.1.22.155 kafka02.dfg.com
23.1.22.222 kafka03.dfg.com
23.1.22.111 master01.dfg.com
23.1.22.239 master02.dfg.com
23.1.22.170 master03.dfg.com
23.1.22.167 worker01.dfg.com
23.1.22.165 worker02.dfg.com
23.1.22.112 worker03.dfg.com

We want to capture all master and worker machines when kafka_name="" with egrep so we did that

kafka_name=""
egrep "\smaster|\sworker|\s$kafka_name"  /etc/hosts

but we still get hosts included kafka machines as

 egrep "\smaster|\sworker|\s$kafka_name"  /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
23.1.22.162 kafka01.dfg.com
23.1.22.155 kafka02.dfg.com
23.1.22.222 kafka03.dfg.com
23.1.22.111 master01.dfg.com
23.1.22.239 master02.dfg.com
23.1.22.170 master03.dfg.com
23.1.22.167 worker01.dfg.com
23.1.22.165 worker02.dfg.com
23.1.22.112 worker03.dfg.com

anyway when we set

kafka_name="kafka"

we also get the kafka machines as

egrep "\smaster|\sworker|\s$kafka_name"  /etc/hosts
23.1.22.162 kafka01.dfg.com
23.1.22.155 kafka02.dfg.com
23.1.22.222 kafka03.dfg.com
23.1.22.111 master01.dfg.com
23.1.22.239 master02.dfg.com
23.1.22.170 master03.dfg.com
23.1.22.167 worker01.dfg.com
23.1.22.165 worker02.dfg.com
23.1.22.112 worker03.dfg.com

so why when we set

kafka_name=""

does it still print the kafka machines from hosts despite $kafka_name being null?

Upvotes: 0

Views: 85

Answers (2)

Ed Morton
Ed Morton

Reputation: 203324

FYI egrep is deprecated in favor of grep -E.

Consider using awk instead, though, for clear, simple control over whatever conditions (not just regexps - conditions) you want to express, e.g.:

$ kafka_name=''
$ awk -v kafka_name="$kafka_name" '( $2 ~ /^(master|worker)/ ) || ( (kafka_name != "") && ($2 ~ ("^"kafka_name)) )' file
23.1.22.111 master01.dfg.com
23.1.22.239 master02.dfg.com
23.1.22.170 master03.dfg.com
23.1.22.167 worker01.dfg.com
23.1.22.165 worker02.dfg.com
23.1.22.112 worker03.dfg.com

$ kafka_name='kafka02'
$ awk -v kafka_name="$kafka_name" '( $2 ~ /^(master|worker)/ ) || ( (kafka_name != "") && ($2 ~ ("^"kafka_name)) )' file
23.1.22.155 kafka02.dfg.com
23.1.22.111 master01.dfg.com
23.1.22.239 master02.dfg.com
23.1.22.170 master03.dfg.com
23.1.22.167 worker01.dfg.com
23.1.22.165 worker02.dfg.com
23.1.22.112 worker03.dfg.com

The above will work using any awk in any shell on every Unix box.

It is using regexp rather than string comparisons, though, just like in your egrep command was doing, and so if any of those names can contain regexp metachars you'd need to escape them or change the script to use index($2,string) == 1 everywhere instead of $2 ~ /^regexp/, e.g.:

$ awk -v kafka_name="$kafka_name" '(index($2,"master") == 1) || (index($2,"worker") == 1) || ( (kafka_name != "") && (index($2,kafka_name) == 1) )' file
23.1.22.155 kafka02.dfg.com
23.1.22.111 master01.dfg.com
23.1.22.239 master02.dfg.com
23.1.22.170 master03.dfg.com
23.1.22.167 worker01.dfg.com
23.1.22.165 worker02.dfg.com
23.1.22.112 worker03.dfg.com

Upvotes: 3

Barmar
Barmar

Reputation: 780851

When $kafka_name is empty, the pattern is "\smaster|\sworker|\s", and the last alternative matches any line with a space, so it matches everything.

One option is to set $kafka_name to something you know will never exist instead of an empty string, e.g.

kafka_name=kafkaXXXX

Another is to add $kafka_name to the pattern only when it's not empty.

pattern="\smaster|\sworker"
if [ -n "$kafka_name" ]
then pattern="$pattern|\s$kafka_name"
fi
egrep "$pattern" /etc/hosts

Upvotes: 2

Related Questions