Tarun K
Tarun K

Reputation: 499

Extract month name from raw string?

Extract month name from raw string

'January 2045 Robots'
'2065 March Mars Colony'
'2089 December Alien'

I want to extract month name from raw string, I took an approch to extract it by creating master tuple

s = 'January 2045 Robots'
months_master = ('january','feb','march','april','may','june','july','august','september','october','november','december')
month = [i for i in months_master if i in s.casefold()]
print(month[0])
'january'

Is there any elegent or any pythonic way to achieve this

Note: For now requirement input string only contains single month ( not multiple like s = 'May to December Bio' )

Upvotes: 2

Views: 5559

Answers (4)

Eugene Yarmash
Eugene Yarmash

Reputation: 150031

You could import the month names from the built-in calendar module and also use a generator instead of a list comprehension for better efficiency:

>>> from calendar import month_name
>>> s = 'January 2045 Robots'
>>> months = {m.lower() for m in month_name[1:]}  # create a set of month names
>>> next((word for word in s.split() if word.lower() in months), None)
'January'

Alternatively, you could use a regular expression:

>>> from calendar import month_name
>>> import re
>>> pattern = '|'.join(month_name[1:])
>>> re.search(pattern, s, re.IGNORECASE).group(0)
'January'

Upvotes: 5

asongtoruin
asongtoruin

Reputation: 10359

The calendar module provides a generator for localised month names called month_name. This list does include an empty string, however, so you need to catch that, and the months appear in title case ("January" etc), so you need to catch for that too. We do this by using if x and x in s.title() - when x is the empty string, this evaluates to False.

from calendar import month_name
s = 'January 2045 Robots'
month = [x for x in month_name if x and x in s.title()]

Upvotes: 1

Lemx
Lemx

Reputation: 166

You can store your months in a set instead of a tuple and check if a word is in this set. This will reduce time complexity from O(N*M), where N is the length of a string and M is the length of the months_master tuple to just O(N). Something like that:

    months_master = set("january", "february", ...)
    month = [word for word in s.casefold().split() if word in months_master]

Upvotes: 0

titipata
titipata

Reputation: 5389

Using word split or word tokenize and see if the word is in the month list

text = 'January 2045 Robots'
month_master = ('january','feb','march','april','may','june','july','august','september','october','november','december')
month_found = [word for word in text.split() if word.lower() in month_master]

# output ['January']

Upvotes: 0

Related Questions