Reputation: 11
Problem: I have a text with multiple lines. One line can contain multiple sentences. I need a regex that only shows the lines where the first word of the line itself contains a non-specific number (could be 1 or 2234234)
For example:
Thi5 is a t3st.
I want this line in my result.This is also a test but with a number in the first word of the second sentence. Th1s is the 2nd sentence
, i don't want this in my result.This is a t3st
, but i am also not interested in this line.Th1s i want too.
0r this one as well
0r this one i want regardless of the whitepace in front of it
But n0t this.
I have to admit that i am a n00b at regex. So far i found following:
^(.*)?[0-9](.*)?
However it will also match if there is a number in the e.g. third word but not the 1st one.
I see that ^(.*)?
matches anything from the start of the line, so also any text up to the 3rd word which contains the number.
And to make it more complicated the 1st word could also contain special characters (?/&%$"§
or any other).
If i would use a character class such as ^[a-zA-Z]?
instead of ^(.*)?
everything would be fine as far as i can see it, but it wouldn't catch whitespaces or special characters nor if there is more than one character in front of the number.
Upvotes: 1
Views: 1472
Reputation: 336158
You can use this:
^\s*\S*[0-9].*
Explanation:
^ # Start of string
\s* # Match optional whitespace at the start of the line
\S* # Match any number of characters except whitespace
[0-9] # Match a digit
.* # Match the rest of the string
See it live on regex101.com.
Upvotes: 3
Reputation: 4429
I think you need to check for whitespace. try: ^\s*\S*[0-9]+\S*\s
^
can either mean "anything except" e.g. [^9]
is anything except the number 9, or it can mean match from the beginning of the string, as it does here.
\s*
means match optional whitespace ie \s
is match whitespace and *
is zero or more times.
\S*
is match optional non-whitespace. This is any character except newlines, carriage returns, space and tabs.
[0-9]+
is match 1 or more numbers ie [0-9]
is match numbers, and +
is 1 or more times.
\S*
- same as \S*
above.
\s
is match 1 whitespace character.
Upvotes: 0