Reputation: 1170
The date in my data is stored in two different formats:
Dienstag 31. Dezember 2013
and 30. Juni 2007
I wrote scripts to extract Year/Month/Day
from both formats and store them in a list:
for row in reader:
line_count = line_count + 1
if row[1] == "DATE":
pass
else:
date = row[1].encode('utf-8')
year = date.split('.')[1].split(" ")[2]
day = date.split(" ")[0]
day = day.replace('.', '')
month = date.split('.')[1].split(' ')[1]
for the first format
and
date = row[1].encode('utf-8')
year = date.split('.')[1].split(" ")[2]
day = date.split(" ")[0]
day = day.replace('.', '')
month = date.split('.')[1].split(' ')[1]
for the second format
However these date formats are randomly occurring throughout the dataset (row[1]
). Is there a way to tell Python when it encounters one of the formats to use the respective script (like an if
statement)?
Thanks.
Upvotes: 0
Views: 99
Reputation: 10951
Another approach with regex, just to give you more options:
import re
if (re.search('^[a-zA-Z]',date):
#Method for First Format
else:
#Method for Second Format
Upvotes: 0
Reputation: 3158
Don't know if there's a compulsion on you but Regular Expressions are more suitable for a problem of this kind. The best part is, it is very robust yet flexible -> you can easily make modifications if you expect more formats (maybe American style like January 31, 2004). Five lines of code rather than original 15 ;)
Here's the code:
import re
reg_date = "(Montag|Dienstag|Mittwoch|Donnerstag|Freitag|Samstag|Sonntag)*\s*(\d{1,2})\.\s+(\w{3,12})\s(\d{2,4})"
def extract_date(string):
results = re.search(reg_date, string)
if results:
date = results.groups()
return date[1], date[2], date[3]
And to use this, simply write a line like:
day,month,year = extract_date("Dienstag 31. Dezember 2013 and ")
print day,month,year
or another experiment with the second format
day,month,year = extract_date("31. May 2013 ")
print day,month,year
Simple, Elegant, Reusable.
Upvotes: 2
Reputation: 5642
You can check if the first character in the string is alpha.
if date[0].isalpha():
# call your function for German dates here
else:
# call the other function
Upvotes: 1
Reputation: 8335
If any only if the second pattern starts with a number
if (date[0].isdigit()):
***method for pattern2***
else:
***method for pattern1***
Upvotes: 2