user124114
user124114

Reputation: 8692

regexp for extracting years

Given a string, how do I extract all sequences of exactly 4 digits?

That is, for 1234 12 12345 1bc5 9876 I want to get [1234, 9876].

I got as far as re.findall('\D\d\d\d\d\D'), but that fails on text boundaries (when there's no character before/after a match).


Solution preferably using Python 2.7, but I guess this is pretty general, any language will do.

Upvotes: 1

Views: 98

Answers (1)

georg
georg

Reputation: 214949

The general answer is surprisingly complicated, see here for more info. However in this particular case, we can simply use a word-boundary assertion \b:

re.findall(r'\b\d{4}\b', ....)

Upvotes: 7

Related Questions