Reputation: 1466
I have a url, like so:
https://www.example.com/exampletitle21sep11oct2020/index.html
The part I need is between the last and second last '/' characters. But I don't need that whole part, I specifically need the last date before the last '/' character. As you can see, there are two dates right next to each other, with no delimiter in between them, making it very hard to use substring
or indexOf
methods. What makes it even more difficult, is that the first date only contains the day and month, while the last date contains the whole date.
Is there some way for me to extract the last date before the last '/' character from this url?
Upvotes: 1
Views: 777
Reputation: 2614
Using Regex you can get second date as follows:
const regex = /\/(?:.*?(\d{1,2}\w{3}\d{0,4}))\/.*?$/;
const [, date] = regex.exec("https://www.example.com/exampletitle21sep11oct2020/index.html");
console.log({ date })
const regex = /\/(?:.*?(\d{1,2}\w{3}\d{0,4}))\/.*?$/;
const [, date] = regex.exec("https://www.example.com/exampletitle21sep9oct2020/index.html");
console.log({ date });
console.log(regex.exec("https://www.example.com/exampletitle21sep9oct/index.html")[1])
Upvotes: 1
Reputation: 49
All will be much simpler with one regexp expression only:
var url = 'https://www.example.com/exampletitle21sep11oct2020/index.html'
var res = url.match( /.*?(\d+[a-z]+\d{4})\/.*?$/i );
// res === [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "11oct2020" ]
var endDate = res[1];
// endDate === "11oct2020"
or (but the "exampletitle" must not ends with a digit):
var res = url.match( /.*?(\d+[a-z]+)(\d+[a-z]+)(\d{4})\/.*?$/i );
// [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "21sep", "11oct", "2020" ]
or:
var res = url.match( /.*?(\d+)([a-z]+)(\d+)([a-z]+)(\d{4})\/.*?$/i );
// [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "21", "sep", "11", "oct", "2020" ]
But, if you know, that a date is always 2-digits (always "01", and not "1"), the "exampletitle" can be any string:
var res = url.match( /.*?(\d{2}[a-z]+\d{4})\/.*?$/i );
var res = url.match( /.*?(\d{2}[a-z]+)(\d+[a-z]+)(\d{4})\/.*?$/i );
var res = url.match( /.*?(\d{2})([a-z]+)(\d+)([a-z]+)(\d{4})\/.*?$/i );
Upvotes: 1
Reputation: 6967
Try this Updated
const url = "https://www.example.com/exampletitle21sep11oct2020/index.html";
const urlData = url.split('/');
const datePart = urlData[urlData.length-2];
const res = datePart.slice(-9); <-- this will give you "11oct2020" -->
Upvotes: 1
Reputation: 48600
You could find and parse the path that contains the following pattern:
^ Line start
.+ One or more of anything
(\d{2}) 2-digit date
(\w{3}) 3-letter month (lowercase)
(\d{2}) 2-digit date
(\w{3}) 3-letter month (lowercase)
(\d{4}) 4-digit year
$ Line end
I used moment to handle parsing the dates.
const expression = /^.+(\d{2})(\w{3})(\d{2})(\w{3})(\d{4})$/;
const format = 'DD MMM YYYY';
const toTitleCase = (str) => str.charAt(0).toUpperCase() + str.slice(1);
const parseDates = (path) => {
const url = new URL(path),
tokens = url.pathname.split('/'),
found = tokens.find(token => token.match(expression));
if (!found) return null;
const [
, startDate, startMonth, endDate, endMonth, year
] = found.match(expression);
return {
start : moment(`${startDate} ${toTitleCase(startMonth)} ${year}`, format),
end : moment(`${endDate} ${toTitleCase(endMonth)} ${year}`, format)
};
};
const dates = parseDates('https://www.example.com/exampletitle21sep11oct2020/index.html');
console.log(dates);
<script src="https://cdnjs.cloudflare.com/ajax/libs/moment.js/2.29.1/moment.min.js"></script>
Upvotes: 1