Zzrot
Zzrot

Reputation: 324

Trying to use regex in bash for a specific word in a column uusing bash

My file is like this -

dog_xyz123
cat_xyz_lm
sun_xyz-hi
moon_xyzabc

Now I want to keep only the lines which have xyz completely. What this means is any string with _ and - along with xyz is allowed and even if there are numbers attached, it is fine. Just that xyz should not be a substring of another letter. That would mean that xyzabc would not be allowed nor would abcxyz.

What I have tried is as follows :

 awk 'match($1,/[-_]?xyz[-_][A-Za-z_0-9-]+/) {print $1}' filename

but it doesn't seem to work.

Upvotes: 1

Views: 77

Answers (6)

Sundeep
Sundeep

Reputation: 23667

If you have grep with pcre

$ cat ip.txt 
dog_xyz123
xyz4
ABCxyz
abc_Xyz-123
cat_xyz_lm
sun_xyz-hi
xyz
moon_xyzabc
2xyz

$ grep -P '(?<![A-Za-z])xyz(?![A-Za-z])' ip.txt 
dog_xyz123
xyz4
cat_xyz_lm
sun_xyz-hi
xyz
2xyz
  • xyz pattern to match
  • (?<![A-Za-z]) negative lookbehind - pattern cannot have letter before it
  • (?![A-Za-z]) negative lookahead - pattern cannot have letter after it

For case-insensitive version, like when Xyz, xYz, etc are also valid matches

$ grep -iP '(?<![a-z])xyz(?![a-z])' ip.txt 
dog_xyz123
xyz4
abc_Xyz-123
cat_xyz_lm
sun_xyz-hi
xyz
2xyz
  • -i case-insensitive matching

Upvotes: 1

James Brown
James Brown

Reputation: 37404

any string with _ and - along with xyz is allowed and even if there are numbers attached, it is fine - - xyz should not be a substring of another letter, ie. xyz surrounded by anything but letters, including the beginning (^) and the end ($) of record:

$ grep "\(^\|[^a-zA-Z]\)xyz\([^a-zA-Z]\|$\)" foo
dog_xyz123
cat_xyz_lm
sun_xyz-hi

Modifying your awk solution to support this:

awk 'match($0,/(^|[^a-zA-Z])xyz([^a-zA-Z]|$)/ {print $0}' foo
dog_xyz123
cat_xyz_lm
sun_xyz-hi

Upvotes: 0

Claes Wikner
Claes Wikner

Reputation: 1517

I think this is what you need.

awk '/_xyz-/' file
sun_xyz-hi

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203522

With that input all you need is:

awk -F'[-_]' '$2=="xyz"' file

If that's not what you need then edit your question to include more truly representative sample input/output.

Upvotes: 0

redneb
redneb

Reputation: 23850

I think you need something like that:

grep -E '^(.*[^A-Za-z])?xyz([^A-Za-z].*)?$'

It will return all lines that contain xyz when it's not preceded or followed by a letter.

Upvotes: 0

Sebastian Lenartowicz
Sebastian Lenartowicz

Reputation: 4864

You can use

grep -e "[_-]\d*xyz/d*[_-]" <infile>

Which should print the lines you want.

Upvotes: 0

Related Questions