Reputation: 20409
I have date in the text format like:
6 weeks ago, 2012 April 18 15:08:18
13 weeks ago, 2012 March 01 17:33:52
The main problem is that this texts are really in Russian, so instead of weeks ago
there is the same text in Russian. And the same is with months (looks like I should create some dictionary of possible values).
I don't know how to start. Should I use regular expressions? Something else?
Upvotes: 3
Views: 446
Reputation: 340763
Not Russian, but Polish:
var dateStr = "6 tygodni temu, 2012 kwiecień 18 15:08:18"
Firefox has no problems in extracting Unicode characters (quick & dirty regular expression):
var regex = /(\d+) ty.* temu, (\d+) (.*) (\d+) (\d{2}):(\d{2}):(\d{2})/
Parsing:
var result = dateStr.match(regex);
The result
is:
[
"6 tygodni temu, 2012 kwiecień 18 15:08:18",
"6",
"2012",
"kwiecień",
"18",
"15",
"08",
"18"
]
I don't know Russian, but you might need to do some extra linguistic work. E.g. in Polish I have "1 tydzień" but "2 tygodnie" and even "5 tygodni" (mind the different form).
Upvotes: 2