Nerdio
Nerdio

Reputation: 1003

Regex in grep to find non-numeric characters in a string

I want to use regular expression to find strings in a file that have part of them that are non-numeric.

This would be a good string IDxxxxxx0123456789.

This would be a bad string IDxxxxxx01234?6789.

The file I am grepping has many different lines of text, and I am specifically interested in ones that conform to IDxxxxxx then I expect 10 digits. I want to find the lines where the 10 digits are not all digits.

I have this so far,

 grep "ID.\{6\}[^0-9]" myFile

This works fine if the first character after the IDxxxxxx is non-numeric. So I extended this as follows;

 grep "ID.\{6\}[^0-9]\{1,10\}" myFile

which I hoped would mean IDxxxxxx followed by 1 to 10 non-numeric characters. This again works if the first character is non-numeric, but not the second.

I think I must be getting close, but not close enough. Can anyone steer me a little on this one please. I shall keep at this, and if I find an answer before anyone answers then I will post what I find.

Thanks in anticipation

(Update - I want to grep out all the bad strings)

Upvotes: 2

Views: 6946

Answers (3)

Prince John Wesley
Prince John Wesley

Reputation: 63698

  grep -Po '\bID.{6}(?!\d{10}).{10}\b' inputFiles

Upvotes: 2

Zagorax
Zagorax

Reputation: 11890

You're writing [^0-9], but ^ means "Every chars but not one of the sequent". So you have to change it like this:

"ID.{6}[0-9]{1,10}\b"

In your way, if the first one is not numeric, the string matches because you have a range {1,10} that must be of non-numeric characters.

Moreover, you need to add \b. Otherwise it will match your second string. With \b, instead, you're saying that after numbers there must be a space, comma, or something that terminates the string, not any other chars.

Upvotes: 0

Here is your strings:

$> cat ./text 
This would be a good string IDxxxxxx0123456789
This would be a bad string IDxxxxxx01234?6789

The idea is to use --invert-match flag.

$> grep --perl-regex --invert-match "ID.{6}[0-9]{10}" ./text 
This would be a bad string IDxxxxxx01234?6789

Upvotes: 0

Related Questions