Bruno Peixoto
Bruno Peixoto

Reputation: 219

Word extraction with regex string

From this post, I am able recognize the pattern object.* by use or regex string m/(?<=object\.)\w*. However, since I am unfamiliar with Linux, I cannot use the commands sed or perl properly to extract desired tokens. Thus, I need your help. My best guess is grep -E -n object file.txt | perl -nle 'm/(?<=object\.)\w*/; print $1'.

Upvotes: 0

Views: 46

Answers (2)

ikegami
ikegami

Reputation: 385590

$1 contains what the first capture ((...)) captured. But you don't have any captures.

Instead, you want $&, which contains the text matched by the pattern.

grep -E -n object file.txt | perl -nle'm/(?<=object\.)\w*/; print $&'

And rather than printing unconditionally, you can print only if a match is found, eliminating the need for grep.

perl -nle'print $? if /(?<=object\.)\w+/' file.txt

Finally, we don't need the relatively-slow lookaround.

perl -nle'print $1 if /object\.(\w+)/' file.txt

On some systems, grep can also do the job using -o and -P.

grep -oP '(?<=object\.)\w+' file.txt

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626699

You can use grep or sed:

grep -oP '(?<=object\.)\w+' file
sed -nE 's/.*object\.([[:alnum:]_]+).*/\1/p' file

See the online demo.

The grep -oP allows you to use PCRE regex (with -P option) and extract all matched texts (with -o option).

The sed command is more complex, it allows extracting matches (that are the last on a line) once per line: first, it suppresses the default line output with -n and sets the regex flavor to POSIX ERE (with -E), then matches a line with object. + one or more alphanumeric or underscore chars captured into \1 and replaces the full line with the Group 1 value, and only that result is returned.

Upvotes: 1

Related Questions