AkshaiShah
AkshaiShah

Reputation: 5939

Regular expression not matching as I expect

I have the following string: connect_2014-06-03.csv and the following regex: (\S+)[_-].

What I want to do is extract only the first word, i.e. connect from the string, but for some reason the regex matches connect_2014-06-. I have tried to make it non greedy by doing (\S+)[_-]? But that does not seem to work.

Anyone have any idea?

Upvotes: 0

Views: 75

Answers (3)

simbabque
simbabque

Reputation: 54323

It's the + that is greedy, not the overall regex. You need to modify the \S+ inside your capture group to not be as greedy.

(\S+?)[_-]

Also see this regex101.

Maybe it makes sense not to use any non-whitespace character, but instead just use ([a-z]+)_? Remember, dash and underscore are also non-whitespace.

Upvotes: 4

Miller
Miller

Reputation: 35198

There are two easy solutions to this.

You can explicitly state that you want non-greedy by adding a ? to your quantifier.

(\S+?)[_-]

Or you can make your character class limit itself:

([^_-\s]*)

Upvotes: 1

anubhava
anubhava

Reputation: 784898

You can use BASH string manipulation instead of regex:

s='connect_2014-06-03.csv'
echo "${s%%_*}"
connect

For using regex you can use:

[[ "$s" =~ ^([^_]+) ]] && echo "${BASH_REMATCH[1]}"
connect

Upvotes: 1

Related Questions