Colby
Colby

Reputation: 313

How to extract bracket data from string

I'm trying to extract the link that says 'rel="next"' from the string below. The issue is the ordering of the four can change, depending if a link to 'previous' or 'next' exists. Thus, I cannot use Regex or split into a string array and reliably get the link.

Here's the string:

<http://v4-api.prod.emailanalyst.com/v4/competitive/search?Authorization={API_KEY}&mobileReady=true&qd=between:20150101000000,20150101060000&onlyCommercial=true&hasCreative=true&page=0&per_page=100>; rel="first",<http://v4-api.prod.emailanalyst.com/v4/competitive/search?Authorization={API_KEY}&mobileReady=true&qd=between:20150101000000,20150101060000&onlyCommercial=true&hasCreative=true&page=20&per_page=100>; rel="last",<http://v4-api.prod.emailanalyst.com/v4/competitive/search?Authorization={API_KEY}&mobileReady=true&qd=between:20150101000000,20150101060000&onlyCommercial=true&hasCreative=true&page=1&per_page=100>; rel="next"

And I need to get this string:

<http://v4-api.prod.emailanalyst.com/v4/competitive/search?Authorization={API_KEY}&mobileReady=true&qd=between:20150101000000,20150101060000&onlyCommercial=true&hasCreative=true&page=1&per_page=100>; rel="next"

Here's a readable version:

<http://v4-api.prod.emailanalyst.com/v4/competitive/search?Authorization={API_KEY}&mobileReady=true&qd=between:20150101000000,20150101060000&onlyCommercial=true&hasCreative=true&page=0&per_page=100>; rel="first",
<http://v4-api.prod.emailanalyst.com/v4/competitive/search?Authorization={API_KEY}&mobileReady=true&qd=between:20150101000000,20150101060000&onlyCommercial=true&hasCreative=true&page=20&per_page=100>; rel="last",
<http://v4-api.prod.emailanalyst.com/v4/competitive/search?Authorization={API_KEY}&mobileReady=true&qd=between:20150101000000,20150101060000&onlyCommercial=true&hasCreative=true&page=1&per_page=100>; rel="next"

And eventually extract just the link for the API request. I've tried splitting the array by ,, however the URL may contain a , so that is also unreliable. Thanks!

Upvotes: 1

Views: 91

Answers (2)

Maljam
Maljam

Reputation: 6284

Assuming that the elements always start with "<http:", you could use a regex with positive lookahead:

String[] elements = str.split(",(?=<http:)");

Upvotes: 0

Pedro Lobito
Pedro Lobito

Reputation: 99011

String myString = "<http://v4-api.prod.emailanalyst.com/v4/competitive/search?Authorization={API_KEY}&mobileReady=true&qd=between:20150101000000,20150101060000&onlyCommercial=true&hasCreative=true&page=0&per_page=100>; rel=\"first\",<http://v4-api.prod.emailanalyst.com/v4/competitive/search?Authorization={API_KEY}&mobileReady=true&qd=between:20150101000000,20150101060000&onlyCommercial=true&hasCreative=true&page=20&per_page=100>; rel=\"last\",<http://v4-api.prod.emailanalyst.com/v4/competitive/search?Authorization={API_KEY}&mobileReady=true&qd=between:20150101000000,20150101060000&onlyCommercial=true&hasCreative=true&page=1&per_page=100>; rel=\"next\"";
  try {
    Pattern regex = Pattern.compile("\"last\",(.*?)$");
    Matcher regexMatcher = regex.matcher(myString);
    if(regexMatcher.find()) {
        String next = regexMatcher.group(1);
        System.out.println(next);
    } 
   } catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
  }

//<http://v4-api.prod.emailanalyst.com/v4/competitive/search?Authorization={API_KEY}&mobileReady=true&qd=between:20150101000000,20150101060000&onlyCommercial=true&hasCreative=true&page=1&per_page=100>; rel="next"

REGEX EXPLANATION:

"last",(.*?)$

Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Greedy quantifiers

Match the character string “"last",” literally (case sensitive) «"last",»
Match the regex below and capture its match into backreference number 1 «(.*?)»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Assert position at the end of the string, or before the line break at the end of the string, if any (line feed) «$»

DEMO: http://ideone.com/7mITYJ

Upvotes: 1

Related Questions