How to fetch a very special data from HTML file

Question

Trying scrape data from a HTML file, which has a react props DIV in it like this:


and the thing I am looking for is the date! like 11 months, 27 days so I can add them up to get an exact number of "days"
I have no idea how to accurately get this data since different person can be 2 years exactly and no days would be in the text. I need both year and days so I can calculate. so I wrote this to find the the part of the code that I need, but I don't know to how to approach the rest..
with open("data.html", 'r') as fpIn:
    for line in fpIn:
        line = line.rstrip()   # Strip trailing spaces and newline
        if "targetUserDuration" in line:
            print("Found")

Alex Montano · Accepted Answer

Use regular expresions to find it.

import re

html = '..."targetUserDuration":"11 months, 27 days","""...'

years_re = re.compile(r'UserDuration".*?([1-9]+) year.*?"""')
months_re = re.compile(r'UserDuration".*?([1-9]|1[0-2]) month.*?"""')
days_re = re.compile(r'UserDuration".*?([1-9]|2[0-9]|3[0-1]) day.*?"""')

year_found = years_re.search(html)
months_found = months_re.search(html)
days_found = days_re.search(html)

years, months, days = 0, 0, 0
if year_found:
    years = int(year_found.group(1))
if months_found:
    months = int(months_found.group(1))
if days_found:
    days = int(days_found.group(1))

print('years: ', years)
print('months: ', months)
print('days: ', days)

Result:

years:  0
months:  11
days:  27

How to fetch a very special data from HTML file

and the thing I am looking for is the date! like 11 months, 27 days so I can add them up to get an exact number of "days"

Answers (2)

Related Questions