Reputation: 151
I am trying to scrape a russian website. However i am stuck with trying to convert a russian cyrillic to DateTime object.
Let's take this html piece for example:
<div class="medium-events-list_datetime">22 января весь день</div>
I am able fetch the content of this div by using lxml, i.e:
date = root.xpath('/html/body/div[1]/div/div[2]/text()')[0].strip()
So the relevant part of of this string is 22 января, which is day and a month.
To get this part i am using the .split()
method
Now here lies the problem, i am trying to convert this into DateTime. I try to use DateParser: https://dateparser.readthedocs.org/en/latest/ ,which is supposed to support russian.
However it returns None
when i pass this string to dateparser.parse()
Did anyone run into similar issue? I am banging my head against the wall on this one. Any help appreciated :)
Upvotes: 2
Views: 1233
Reputation: 381
try running this example:
#coding=utf-8
import dateparser
s = u"22 января"
print dateparser.parse(s)
It should output 2016-01-22 00:00:00
Important: Make sure that you're actually using utf-8 strings. More info: https://www.python.org/dev/peps/pep-0263/
Otherwise your parsing/splitting might be wrong, so try having a look at the results after the split()
.
Upvotes: 4