Mewster
Mewster

Reputation: 1063

regex, find last part of a url

Let's take an url like

www.url.com/some_thing/random_numbers_letters_everything_possible/set_of_random_characters_everything_possible.randomextension

If I want to capture "set_of_random_characters_everything_possible.randomextension" will [^/\n]+$work? (solution taken from Trying to get the last part of a URL with Regex)

My question is: what does the "\n" part mean (it works even without it)? And, is it secure if the url has the most casual combination of characters apart "/"?

Upvotes: 0

Views: 338

Answers (2)

Andy Lester
Andy Lester

Reputation: 93656

First, please note that www.url.com/some_thing/random_numbers_letters_everything_possible/set_of_random_characters_everything_possible.randomextension is not a URL without a scheme like http:// in front of it.

Second, don't parse URLs yourself. What language are you using? You probably don't want to use a regex, but rather an existing module that has already been written, tested, and debugged.

If you're using PHP, you want the parse_url function.

If you're using Perl, you want the URI module.

Upvotes: 2

Firas Dib
Firas Dib

Reputation: 2621

Have a look at this explanation: http://regex101.com/r/jG2jN7

Basically what is going on here is "match any character besides slash and new line, infinite to 1 times". People insert \r\n into negated char classes because in some programs a negated character class will match anything besides what has been inserted into it. So [^/] would in that case match new lines.

For example, if there was a line break in your text, you would not get the data after the linebreak.

This is however not true in your case. You need to use the s-flag (PCRE_DOTALL) for this behavior.

TL;DR: You can leave it or remove it, it wont matter.

Ask away if anything is unclear or I've explained it a little sloppy.

Upvotes: 1

Related Questions