sztyrymytyry
sztyrymytyry

Reputation: 151

Converting russian string to datetime

I am trying to scrape a russian website. However i am stuck with trying to convert a russian cyrillic to DateTime object.

Let's take this html piece for example:

<div class="medium-events-list_datetime">22 января весь день</div>

I am able fetch the content of this div by using lxml, i.e:

date = root.xpath('/html/body/div[1]/div/div[2]/text()')[0].strip()

So the relevant part of of this string is 22 января, which is day and a month.

To get this part i am using the .split() method

Now here lies the problem, i am trying to convert this into DateTime. I try to use DateParser: https://dateparser.readthedocs.org/en/latest/ ,which is supposed to support russian.

However it returns None when i pass this string to dateparser.parse()

Did anyone run into similar issue? I am banging my head against the wall on this one. Any help appreciated :)

Upvotes: 2

Views: 1233

Answers (1)

dd23
dd23

Reputation: 381

try running this example:

#coding=utf-8
import dateparser
s = u"22 января"
print dateparser.parse(s)

It should output 2016-01-22 00:00:00

Important: Make sure that you're actually using utf-8 strings. More info: https://www.python.org/dev/peps/pep-0263/

Otherwise your parsing/splitting might be wrong, so try having a look at the results after the split().

Upvotes: 4

Related Questions