Reputation: 11132
I want to extract numbers and only numbers from a string.
Say I have a string like this: "VW Golf 2009"
. I can use the regex [0-9]+
to extract the 2009
part.
The problem arises when I have a string like this: "BMW 2013 i8"
. I want to extract the 2013
part, but not the 8
part.
Basically, I want to extract the "year" part of any string similar to the following:
BMW 2013 i8
VW Golf 2009
1938 CarCompany, inc. <insert car name here>
My 128th birthday is in the year 2014.
aui895h 2013 5qnui 89hth658h uab2 52h5h528h
etc.
Upvotes: 0
Views: 87
Reputation: 11132
(?<=^|\s)[0-9]+?(?=\s|$|\.(?=\s|$)|[;,\"'!?])
will work.
One advantage of this regex is that it can easily be modified.
Explanation:
(?<=^|\s)
is a Positive Lookbehind.
(?<=
begins the positive lookbehind.^|\s
matches either of the following:
^
a start-of-string anchor,\s
any whitespace or newline character.)
ends the positive lookbehind.[0-9]+?
is the heart of this regex.
[0-9]
matches a single character that is any digit (0123456789):+?
is a Possessive Quantifier that repeats [0-9]
one or more times.(?=\s|$|\.(?=\s|$)|[;,\"'!?])
is a Positive Lookahead.
(?=
begins the positive lookahead.\s|$|\.(?=\s|$)|[;,\"'!?]
matches any of the following:
\s
any whitespace or newline character.$
an end-of-string anchor.\.(?=\D)
the character .
, if that character is immediately followed by
\D
any any non-digit character.[;,\"'!?]
any of these characters: ;
, ,
, "
, '
, !
, ?
.)
ends the positive lookahead.You can also find another good explanation here: http://regex101.com/r/pC6yA9
To implement this in java, you can use this code:
Matcher yearMatcher = Pattern.compile("(?<=^|\s)[0-9]+?(?=\s|$|[.,;](?=\s|$)).matcher("BMW 2013 i8");
yearMatcher.find();
year = yearMatcher.group();
making sure to import java.util.regex.*
Upvotes: 1
Reputation: 159
I believe \d{4}
will solve this nicely.
If you want to ensure that only a 4 digit standalone year word is matched, \W\d{4}\W
will also work.
If you further just want to ensure that "sensible" dates (4 digits and beginning in 19, 20) you can do (19|20)\d{2}
.
Upvotes: 1
Reputation: 1991
What about using the \b
(boundary) metacharacter (depending on your regex implemenation), like so?
\b\d+\b
Or if you want a specific number of digits:
\b\d{4}\b
Upvotes: 1