flappix
flappix

Reputation: 2217

remove recognized date from string

As input I have several strings containing dates in different formats like

I use dateutil.parser.parse to recognize the dates in the strings.
In the next step I want to remove the dates from the strings. Result should be

Is there a simple way to achieve this?

Upvotes: 2

Views: 9563

Answers (4)

Mo'men Ahmed
Mo'men Ahmed

Reputation: 61

def remove_dates(sentence):
"""remove the dates like Mar 30  2013"""
sentence = re.sub('(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{2}\s\d{4}', ' ', sentence)
return sentence

Test:

remove_dates(' good Mar 30 2013 day')

' good day'

Upvotes: 0

Paul
Paul

Reputation: 10863

You can use the fuzzy_with_tokens option to dateutil.parser.parse:

from dateutil.parser import parse

dtstrs = [
    "Peter drinks tea at 16:45",
    "My birthday is on 08-07-1990",
    "On Sat 11 July I'll be back home",
    ]

out = [
    parse(dtstr, fuzzy_with_tokens=True)
    for dtstr in dtstrs
]

Result:

[(datetime.datetime(2018, 7, 17, 16, 45), ('Peter drinks tea at ',)),
 (datetime.datetime(1990, 8, 7, 0, 0), ('My birthday is on ',)),
 (datetime.datetime(2018, 7, 11, 0, 0), ('On ', ' ', " I'll be back home"))]

When fuzzy_with_tokens is true, the parser returns a tuple of a datetime and a tuple of ignored tokens (with the used tokens removed). You can join them back into a string like this:

>>> ['<missing>'.join(x[1]) for x in out]
['Peter drinks tea at ',
 'My birthday is on ',
 "On <missing> <missing> I'll be back home"]

I'll note that the fuzzy parsing logic is not amazingly reliable, because it's very difficult to pick out only valid components from a string and use them. If you change the person drinking tea to someone named April, for example:

>>> dt, tokens = parse("April drinks tea at 16:45", fuzzy_with_tokens=True)
>>> print(dt)
2018-04-17 16:45:00
>>> print('<missing>'.join(tokens))
 drinks tea at 

So I would urge some caution with this approach (though I can't really recommend a better approach, this is just a hard problem).

Upvotes: 4

Sunitha
Sunitha

Reputation: 12005

If you define a function that would validate a string as a date or not, we could do this in a one-liner

from dateutil import parser

data = ['Peter drinks tea at 16:45', 'My birthday is on 08-07-1990', "On Sat 11 July I'll be back home"]

def is_valid_date(date_str):
    try:
        parser.parse(date_str)
        return True
    except:
        return False

new_list = [' '.join([w for w in line.split() if not is_valid_date(w)]) for line in data]
print(new_list)
# ['Peter drinks tea at', 'My birthday is on', "On I'll be back home"]

Upvotes: 1

melike
melike

Reputation: 147

You can use re.findall() method for find the dates then split it from your string. I think the code in link in below can solve your problem.

https://stackoverflow.com/a/2770062/9721027

Upvotes: 0

Related Questions