instanceof
instanceof

Reputation: 1466

Get part of substring before character

I have a url, like so:

https://www.example.com/exampletitle21sep11oct2020/index.html

The part I need is between the last and second last '/' characters. But I don't need that whole part, I specifically need the last date before the last '/' character. As you can see, there are two dates right next to each other, with no delimiter in between them, making it very hard to use substring or indexOf methods. What makes it even more difficult, is that the first date only contains the day and month, while the last date contains the whole date.

Is there some way for me to extract the last date before the last '/' character from this url?

Upvotes: 1

Views: 777

Answers (4)

Aadil Mehraj
Aadil Mehraj

Reputation: 2614

Using Regex you can get second date as follows:

const regex = /\/(?:.*?(\d{1,2}\w{3}\d{0,4}))\/.*?$/;

const [, date] = regex.exec("https://www.example.com/exampletitle21sep11oct2020/index.html");
console.log({ date })

const regex = /\/(?:.*?(\d{1,2}\w{3}\d{0,4}))\/.*?$/;

const [, date] = regex.exec("https://www.example.com/exampletitle21sep9oct2020/index.html");
console.log({ date });
console.log(regex.exec("https://www.example.com/exampletitle21sep9oct/index.html")[1])

Upvotes: 1

devdb
devdb

Reputation: 49

All will be much simpler with one regexp expression only:

var url = 'https://www.example.com/exampletitle21sep11oct2020/index.html'

var res = url.match( /.*?(\d+[a-z]+\d{4})\/.*?$/i );
// res === [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "11oct2020" ]
var endDate = res[1];
// endDate === "11oct2020"

or (but the "exampletitle" must not ends with a digit):

var res = url.match( /.*?(\d+[a-z]+)(\d+[a-z]+)(\d{4})\/.*?$/i );
// [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "21sep", "11oct", "2020" ]

or:

var res = url.match( /.*?(\d+)([a-z]+)(\d+)([a-z]+)(\d{4})\/.*?$/i );
// [ "https://www.example.com/exampletitle21sep11oct2020/index.html", "21", "sep", "11", "oct", "2020" ]

But, if you know, that a date is always 2-digits (always "01", and not "1"), the "exampletitle" can be any string:

var res = url.match( /.*?(\d{2}[a-z]+\d{4})\/.*?$/i );
var res = url.match( /.*?(\d{2}[a-z]+)(\d+[a-z]+)(\d{4})\/.*?$/i );
var res = url.match( /.*?(\d{2})([a-z]+)(\d+)([a-z]+)(\d{4})\/.*?$/i );

Upvotes: 1

Nooruddin Lakhani
Nooruddin Lakhani

Reputation: 6967

Try this Updated

const url = "https://www.example.com/exampletitle21sep11oct2020/index.html";
const urlData = url.split('/');
const datePart = urlData[urlData.length-2];
const res = datePart.slice(-9); <-- this will give you "11oct2020" -->

Upvotes: 1

Mr. Polywhirl
Mr. Polywhirl

Reputation: 48600

You could find and parse the path that contains the following pattern:

^         Line start
.+        One or more of anything
(\d{2})   2-digit date
(\w{3})   3-letter month (lowercase)
(\d{2})   2-digit date
(\w{3})   3-letter month (lowercase)
(\d{4})   4-digit year
$         Line end

Example

I used moment to handle parsing the dates.

const expression = /^.+(\d{2})(\w{3})(\d{2})(\w{3})(\d{4})$/;
const format = 'DD MMM YYYY';
const toTitleCase = (str) => str.charAt(0).toUpperCase() + str.slice(1);

const parseDates = (path) => {
  const url    = new URL(path),
        tokens = url.pathname.split('/'),
        found  = tokens.find(token => token.match(expression));
  if (!found) return null;
  const [
    , startDate, startMonth, endDate, endMonth, year
  ] = found.match(expression);
  return {
    start : moment(`${startDate} ${toTitleCase(startMonth)} ${year}`, format),
    end   : moment(`${endDate} ${toTitleCase(endMonth)} ${year}`, format)
  };
};

const dates = parseDates('https://www.example.com/exampletitle21sep11oct2020/index.html');

console.log(dates);
<script src="https://cdnjs.cloudflare.com/ajax/libs/moment.js/2.29.1/moment.min.js"></script>

Upvotes: 1

Related Questions