Reputation: 1288
I have a Javascript string:
let entries = `23-05-1990 Some heading
27-05-1990 Liar Liar
29-05-1990 Another Heading
30-05-1990 50/50
31-05-1990 My day`
Using regex I need to process this string and generate two arrays:
// 1) date array:
date = ["23-05-1990","27-05-1990", "29-05-1990", "30-05-1990", "31-05-1990"]
// 2) headings array
headings = ["Some heading", "Liar Liar" ,"Another Heading", "50/50", "My day"]
So far this is simple: Split by line break and then pass each individual date-heading to a regex. Get the date and the heading and append them to their respective arrays.
But the issue is I don't have a consistent format for the data.
Some of the data is in this format. i.e. heading comes before the date
`Liar Liar 27-05-1990
Another Heading 29-05-1990
50/50 30-05-1990
My day 31-05-1990 `
there may be a separator between the heading and the date.
`23-05-1990 : Some heading
27-05-1990 : Yes Man`
`29-05-1990: Another Heading`
`30-05-1990 - 50/50
31-05-1990 - My day`
So, date and heading would be there(we don't know which one comes first) but the separator may or may not be present.
Also,
The separator is one of the three listed below:
" " (space), "-" , ":"
the heading can't start or end with any character other than an alphabet or an int.
Upvotes: 2
Views: 81
Reputation: 110725
You could match the following regular expression. The date string will be in capture group 1 or 4 and the other will be empty. The heading will be in capture group 2 or 3 and the other will be empty.
^(?:(\d{2}-\d{2}-\d{4}) *[-:]? *([A-Z\d].*)|([A-Z\d].*)(?<![ :-]) *[-:]? *(\d{2}-\d{2}-\d{4}))$
As seen at the link, "$1$4"
returns the date string and "$2$3"
returns the heading.
Javascript's regex engine performs the following operations.
^ : assert beginning of string
(?: : begin non-capture group
(\d{2}-\d{2}-\d{4}) : match date and save to capture group 1
[ ]*[-:]?[ ]* : match 0+ spaces, optional '-' or ':',
0+ spaces
([A-Z\d].*) : match heading and save to capture group 2
| : or
([A-Z\d].*) : match heading and save to capture group 3
(?<![ :-]) : negative lookbehind asserts previous
character is neither ' ', ':' nor '-'
[ ]*[-:]?[ ]* : match 0+ spaces, optional '-' or ':',
0+ spaces
(\d{2}-\d{2}-\d{4}) : match date and save to capture group 4
) : end non-capture group
$ : assert end of string
Upvotes: 2
Reputation: 3457
This works but it doesn't account for duplicates, so if that is a problem then you can filter those out after, or use key/value pairs instead on an array.
Part of the while loop was from regex101.com
const regexes = [
/((?<date>\d{2}-\d{2}-\d{4})[ :\-]+(?<title>.*)[\r\n])/gm,
/(?<title>.*)[ :\-]+((?<date>\d{2}-\d{2}-\d{4})[\r\n])/gm
];
const str = `23-05-1990 Some heading
27-05-1990 Liar Liar
29-05-1990 Another Heading
30-05-1990 50/50
31-05-1990 My day
Liar Liar 27-05-1990
Another Heading 29-05-1990
50/50 30-05-1990
My day 31-05-1990
23-05-1990 : Some heading
27-05-1990 : Yes Man
29-05-1990: Another Heading
30-05-1990 - 50/50
31-05-1990 - My day`;
let output = [];
regexes.forEach(regex => {
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
output.push([m.groups.date.trim(), m.groups.title.trim()]);
}
});
console.log(output);
Output is:
[
[ '23-05-1990', 'Some heading' ],
[ '27-05-1990', 'Liar Liar' ],
[ '29-05-1990', 'Another Heading' ],
[ '30-05-1990', '50/50' ],
[ '31-05-1990', 'My day' ],
[ '23-05-1990', 'Some heading' ],
[ '27-05-1990', 'Yes Man' ],
[ '29-05-1990', 'Another Heading' ],
[ '30-05-1990', '50/50' ],
[ '27-05-1990', 'Liar Liar' ],
[ '29-05-1990', 'Another Heading' ],
[ '30-05-1990', '50/50' ],
[ '31-05-1990', 'My day' ]
]
Upvotes: 1