Reputation: 2149

Extracting URL part (blog name) with Groovy

I am working with the following URL: http://www.espn.com/blog/stephania-bell/post/_/id/3563/key-fantasy-football-injury-updates-for-week-4-2

I am trying to extract the name of the blog as (stephania-bell).

I have implemented following function to extract the expected value from URL:

def getBlogName( def decodeUrl )
{
    def urlParams = this.paramsParser.parseURIToMap( URI.create( decodeUrl ) )
    def temp = decodeUrl.replace( "http://www.espn.com", "" )
            .replaceAll( "(/_/|\\?).*", "" )
            .replace( "/index", "" )
            .replace( "/insider", "" )
            .replace( "/post", "" )
            .replace( "/tag", "" )
            .replace( "/category", "" )
            .replace( "/", "" )
            .replace( "/blog/", "" )
    def blogName = temp.replace( "/", "" )
    return blogName
}

However I am missing something and the value it returns is blogstephania-bell. Could you please help me understanding what I am missing in the function implementation? Or maybe there is a better way of doing the same thing?

Upvotes: 2

Answers (3)

Jayan

Reputation: 18459

It may be more useful to treat URL as is by Java's class URL. Then:

extract the path as String using getPath()
split into segments by path-delimiter split("/")
extract relevant path-segment using array-index pathSegments[2]

String plainText="http://www.espn.com/blog/stephania-bell/post/_/id/3563/key-fantasy-football-injury-updates-for-week-4-2";
    
def url = plainText.toURL();
def fullPath = url.getPath();
def pathSegments = fullPath.split("/")
assert "stephania-bell" == pathSegments[2]

Upvotes: 2

Szymon Stepniak

Reputation: 42204

This kind of job can be easily handled by regular expression. If we want to extract URL part between http://www.espn.com/blog/ and the next / then following code will do the trick:

import java.util.regex.Pattern

def url = 'http://www.espn.com/blog/stephania-bell/post/_/id/3563/key-fantasy-football-injury-updates-for-week-4-2'

def pattern = Pattern.compile('^https?://www\\.espn\\.com/blog/([^/]+)/.*$')

def (_, blog) = (url =~ pattern)[0]

assert blog == 'stephania-bell'

Upvotes: 1

tim_yates

Reputation: 171114

Not what you asked, but just for fun (I thought this is what you wanted at first)

@Grab('org.jsoup:jsoup:1.11.3')
import static org.jsoup.Jsoup.connect

def name = connect('http://www.espn.com/blog/stephania-bell/post/_/id/3563/key-fantasy-football-injury-updates-for-week-4-2')
  .get()
  .select('.sticky-header h1 a')
  .text()

assert name == 'Stephania Bell Blog'

Upvotes: 2

Extracting URL part (blog name) with Groovy

Answers (3)

Related Questions