Reputation: 499
Extract month name from raw string
'January 2045 Robots'
'2065 March Mars Colony'
'2089 December Alien'
I want to extract month name from raw string, I took an approch to extract it by creating master tuple
s = 'January 2045 Robots'
months_master = ('january','feb','march','april','may','june','july','august','september','october','november','december')
month = [i for i in months_master if i in s.casefold()]
print(month[0])
'january'
Is there any elegent or any pythonic way to achieve this
Note: For now requirement input string only contains single month ( not multiple like s = 'May to December Bio'
)
Upvotes: 2
Views: 5559
Reputation: 150031
You could import the month names from the built-in calendar
module and also use a generator instead of a list comprehension for better efficiency:
>>> from calendar import month_name
>>> s = 'January 2045 Robots'
>>> months = {m.lower() for m in month_name[1:]} # create a set of month names
>>> next((word for word in s.split() if word.lower() in months), None)
'January'
Alternatively, you could use a regular expression:
>>> from calendar import month_name
>>> import re
>>> pattern = '|'.join(month_name[1:])
>>> re.search(pattern, s, re.IGNORECASE).group(0)
'January'
Upvotes: 5
Reputation: 10359
The calendar
module provides a generator for localised month names called month_name
. This list does include an empty string, however, so you need to catch that, and the months appear in title case ("January" etc), so you need to catch for that too. We do this by using if x and x in s.title()
- when x
is the empty string, this evaluates to False
.
from calendar import month_name
s = 'January 2045 Robots'
month = [x for x in month_name if x and x in s.title()]
Upvotes: 1
Reputation: 166
You can store your months in a set
instead of a tuple and check if a word is in this set. This will reduce time complexity from O(N*M), where N is the length of a string and M is the length of the months_master
tuple to just O(N).
Something like that:
months_master = set("january", "february", ...)
month = [word for word in s.casefold().split() if word in months_master]
Upvotes: 0
Reputation: 5389
Using word split or word tokenize and see if the word is in the month list
text = 'January 2045 Robots'
month_master = ('january','feb','march','april','may','june','july','august','september','october','november','december')
month_found = [word for word in text.split() if word.lower() in month_master]
# output ['January']
Upvotes: 0