Reputation: 310
I have a timestamp on a video in similar format to Thurs May 7 10:21:02 1998
. I'd like to extract this piece of text from the video. Note: The day may be of 3-4 characters (ex. Wed, Thurs) and the date may be of 1-2 characters.
I tried to look for similarly asked questions on this platform but I couldn't find one that uses regex to extract date in this particular format, taking care of the spaces and the changing number of characters for the day and date.
Here is my attempt:
text = pytesseract.image_to_string(Image.open(file))
# date_time = re.findall(r'\d{2}:\d{2}:\d{2}', text) # works fine; extracts the time as desired
date_time = re.findall(r'\d{3,4} \d{3} \d{1,2} \d{2}:\d{2}:\d{2} \d{4}', text) #doesn't work
print ("timestamp: ", date_time)
Upvotes: 1
Views: 1429
Reputation: 626738
You can use
\w+\s+\w{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}\s+\d{4}
See the regex demo. Details:
\w+
- one or more word chars\s+
- one or more whitespaces\w{3}
- three word chars\s+
- one or more whitespaces\d{1,2}
- one or two digits\s+
- one or more whitespaces\d{2}:\d{2}:\d{2}
- two digits, :
, two digits, :
and two digits\s+
- one or more whitespaces\d{4}
- four digits.Upvotes: 2