still_learning
still_learning

Reputation: 806

Extract from string information on date/time

I have some texts that generally starts with:

“12 minutes ago - There was a meeting...”
“2 hours ago - Apologies for being...”
“1 day ago - It is a sunny day in London...”

and so on. Basically I have information on:

Minutes 
Hours
Day (starting from today)

I would like to transform this kind of information into valuable time serie information, in order to extract this part and create a new column from that (Datetime). In my dataset, I have one column (Date) where I have already the date of when the research was performed (for example, today), in this format: 26/05/2020 and when the search was submitted (e.g. 8:41am). So if the text starts with “12 minutes ago”, I should have:

26/05/2020 - 8:29 (datetime format in Python)

And for others:

26/05/2020 - 6:41
25/05/2020 - 8:41

The important thing is to have something (string, numeric, date format) that I can plot as time series (I would like to see how many texts where posted in terms of time interval). Any idea on how I could do this?

Upvotes: 0

Views: 400

Answers (2)

azro
azro

Reputation: 54148

If the format stays simple : <digits> <unit> ago ... it's pretty to parse with "^(\d+) (\w+) ago".

Then, once you have ('minutes', '12') you'll pass these to timedelta which accepts every unit as a keyword argument timedelta(minutes=12), you'll do that by passing a mapping **{unit:value}

def parse(content):
    timeparts = re.search(r"^(\d+) (\w+) ago", content)
    if not timeparts:
        return None, content
    unit = timeparts.group(2).rstrip('s') + 's' # ensure ends with 's'
    #return datetime.now()-timedelta(**{unit:int(timeparts.group(1))})           # Now date
    return datetime(2020,5,26,8,0,0)-timedelta(**{unit:int(timeparts.group(1))}) # Fixed date

Demo

values = ["12 minutes ago - There was a meeting...","2 hours ago - Apologies for being...","1 day ago - It is a sunny day in London..."]

for value in values:
  res = parse(value)
  print(res)


2020-05-26 07:48:00
2020-05-26 06:00:00
2020-05-25 08:00:00

Upvotes: 2

Nick
Nick

Reputation: 3845

You should use a natural language processing library for this, like spaCY or NLTK

Here is an example of tokenization from the link above, showing how spacY breaks down a sentence:

enter image description here

Upvotes: 0

Related Questions