user1645023
user1645023

Reputation:

Extract Url From a String

I have a URL:

url = "http://timesofindia.feedsportal.com/fy/8at2EuL0ihSIb3s7/story01.htmA"

There are some unwanted characters like A,TRE, at the end. I want to remove this so the URL will be like this:

url = http://timesofindia.feedsportal.com/fy/8at2EuL0ihSIb3s7/story01.htm

How can I remove them?

Upvotes: 0

Views: 151

Answers (1)

waldyr.ar
waldyr.ar

Reputation: 15244

If your url always finish with .htm, .apsx or .php you can solve it with a simple regex:

url = url[/^(.+\.(htm|aspx|php))(:?.*)$/, 1]

Tests here at Rubular.

First I use this method to get a substring, works like slice. Then comes the regex. From left to right:

^                   # Start of line
  (                   # Capture everything wanted enclosed
    .+                  # 1 or more of any character
    \.                  # With a dot after it
    (htm|aspx|php)      # htm or aspx or php
  )                   # Close url asked in question
  (                   # Capture undesirable part
    :?                  # Optional
    .*                  # 0 or more any character
  )                   # Close undesirable part
$                   # End of line

Upvotes: 2

Related Questions