user1054583
user1054583

Reputation: 95

Regex for Newbie

I'm new to regex and trying to figure something out for use in scala.

I'm trying to identify URLs within a very long string. I've looked around a lot and the best I've found is

val regex = """https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?""".r

That leaves a little something to be desired however leaving things like "">Images" at the end. I'm trying to figure out what the heck my regex means so I can dissect it and make it stop when it hits a non word character after the . in .com/.org/.edu/.whatever.

I was hoping someone wouldn't mind explaining what individual elements are within this pre-formed regex so that I may figure out what's going on and learn more about regex. I've gone through a tutorial or two and found out some things, but what I've asked for I think would be invaluable to me right now.

I get that:

I don't get:

Anyways I was hoping someone could mentor me for a question rather than shove me to yet another tutorial by helping explain individual elements as they come up. I'd appreciate it.

regexlib was helpful and got me:

val regex = """https?://\w+\.\w+\.\w+[\w/_\.\?=&:]+""".r

every bit of which I understand!

Upvotes: 0

Views: 335

Answers (1)

Neil Essy
Neil Essy

Reputation: 3607

I think your main problem with ">Images being included is solved by replacing the part matching the query html string

(\?\S+)

with something that does not include " < > as the \S does

(\?[\w=$&.\-^@#~+%]+)

Upvotes: 2

Related Questions