Reputation: 3
This is the problem:
I have some .csv files with travels info, and the dates appear like strings (each line for one travel):
I must parse the strings to Dates, and keep them into an array for each travel.
The problem is that I don't know how to do it. Even my univesrity teachers told me that they don't know how to do so :S. I can't find/create a pattern using http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
After parsing them i have to search all travels between two dates.
But how? How to parse them? it's possible?
Upvotes: 0
Views: 211
Reputation: 38071
This requires Natural Language Processing (NLP) , see Wikipedia for an account: http://en.wikipedia.org/wiki/Natural_language_processing.
Your problem as stated is very hard. There are many ways of representing a single date, and your examples include ranges of dates and formulae for generating dates. It sounds as if you have a limited subset of language - frequent use of "all", "from", etc.
If you are in control of the language (i.e. these are being generated by humans who comply with your documentation) then you have a chance of formalising it (although it will take a lot of work - months). If you are not in charge of it, then every time a new phrase appears you will have to add it to the specs.
I suggest you got through the file and look for stock phrases "All [weekdayname]s [from | between | until | before]". Or "in [January | February ...]". Then substitute these in in phrases. If you find this covers all the cases you may be able to extract particular phrases". But if you have anaphora like "next Tuesday" it will be much harder.
Upvotes: 1
Reputation: 85546
You're in the domain of NLP (Natural Language Processing), what is possible or impossible is fuzzy in this domain. From a fast Google search, I've found that the Natty Date Parser might be useful for you.
For more theory background on NLP, you might be interested in Natural Language Processing course of Stanford University on Coursera (at the moment the course is not open for enrolment, but lectures are available for free.
You can also use a set of strict regular expressions that would match only one of your possible cases and apply them from the most restrictive to the most relaxed.
The first thing I would define to attack your problem is what you expect as an output of your method, since in some cases it's a single date, in some cases an interval, in some others multiple intervals.
Upvotes: 1