Reputation: 324
My file is like this -
dog_xyz123
cat_xyz_lm
sun_xyz-hi
moon_xyzabc
Now I want to keep only the lines which have xyz
completely. What this means is any string with _
and -
along with xyz
is allowed and even if there are numbers attached, it is fine. Just that xyz
should not be a substring of another letter. That would mean that xyzabc
would not be allowed nor would abcxyz
.
What I have tried is as follows :
awk 'match($1,/[-_]?xyz[-_][A-Za-z_0-9-]+/) {print $1}' filename
but it doesn't seem to work.
Upvotes: 1
Views: 77
Reputation: 23667
If you have grep
with pcre
$ cat ip.txt
dog_xyz123
xyz4
ABCxyz
abc_Xyz-123
cat_xyz_lm
sun_xyz-hi
xyz
moon_xyzabc
2xyz
$ grep -P '(?<![A-Za-z])xyz(?![A-Za-z])' ip.txt
dog_xyz123
xyz4
cat_xyz_lm
sun_xyz-hi
xyz
2xyz
xyz
pattern to match (?<![A-Za-z])
negative lookbehind - pattern cannot have letter before it(?![A-Za-z])
negative lookahead - pattern cannot have letter after itFor case-insensitive version, like when Xyz
, xYz
, etc are also valid matches
$ grep -iP '(?<![a-z])xyz(?![a-z])' ip.txt
dog_xyz123
xyz4
abc_Xyz-123
cat_xyz_lm
sun_xyz-hi
xyz
2xyz
-i
case-insensitive matchingUpvotes: 1
Reputation: 37404
any string with _
and -
along with xyz
is allowed and even if there are numbers attached, it is fine - - xyz
should not be a substring of another letter, ie. xyz
surrounded by anything but letters, including the beginning (^
) and the end ($
) of record:
$ grep "\(^\|[^a-zA-Z]\)xyz\([^a-zA-Z]\|$\)" foo
dog_xyz123
cat_xyz_lm
sun_xyz-hi
Modifying your awk solution to support this:
awk 'match($0,/(^|[^a-zA-Z])xyz([^a-zA-Z]|$)/ {print $0}' foo
dog_xyz123
cat_xyz_lm
sun_xyz-hi
Upvotes: 0
Reputation: 203522
With that input all you need is:
awk -F'[-_]' '$2=="xyz"' file
If that's not what you need then edit your question to include more truly representative sample input/output.
Upvotes: 0
Reputation: 23850
I think you need something like that:
grep -E '^(.*[^A-Za-z])?xyz([^A-Za-z].*)?$'
It will return all lines that contain xyz
when it's not preceded or followed by a letter.
Upvotes: 0
Reputation: 4864
You can use
grep -e "[_-]\d*xyz/d*[_-]" <infile>
Which should print the lines you want.
Upvotes: 0