Reputation: 77
I have stored the response from a web-application in a string. The string contains several URL:s, and it is dynamic. Could be anything from 10-1000 URL:s.
I work with performance engineering, but this time I have to code a plugin in java, and I am far from an expert in programming.
The problem I have is that in my response-string, I have a lot of gibberish that I don't need, and I don't know how to filtrate it. In my print/request I only want to send the URLS.
I've come this far:
responseData = "http://xxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-65354-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/segment1_4_av.ts?null=" +
"#EXTINF:10.000, " +
"http://xxxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-65365-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/segment2_4_av.ts?null=" +
"#EXTINF:fgsgsmoregiberish, " +
"http://xxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-6353-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/segment2_4_av.ts?null=";
pattern = "^(http://.*\\.ts)";
pr = Pattern.compile(pattern);
math = pr.matcher(responseData);
if (math.find()) {
System.out.println(math.group());
// in this print, I get everything from the response. I only want the URLS (dynamic. could be different names, but they all start with http and end with .ts).
}
else {
System.out.println("No Math");
}
Upvotes: 0
Views: 74
Reputation: 1
Use the following regex pattern:
(((http|ftp|https):\/{2})+(([0-9a-z_-]+\.)+([a-z]{2,4})(:[0-9]+)?((\/([~0-9a-zA-Z\#\+\%@\.\/_-]+))?(\?[0-9a-zA-Z\+\%@\/&\[\];=_-]+)?)?))\b
Explanation:
((http|ftp|https):\/{2})
Upvotes: 0
Reputation: 98921
Just make you regex lazy with .*?
instead of greedy .*
, i.e.:
pr = Pattern.compile("(https?.*?\\.ts)");
Regex demo:
https://regex101.com/r/nQ5pA7/1
Regex Explanantion:
(https?.*?\.ts)
Match the regex below and capture its match into backreference number 1 «(https?.*?\.ts)»
Match the character string “http” literally (case sensitive) «http»
Match the character “s” literally (case sensitive) «s?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match any single character that is NOT a line break character (line feed, carriage return, next line, line separator, paragraph separator) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “.” literally «\.»
Match the character string “ts” literally (case sensitive) «ts»
Upvotes: 0
Reputation: 89557
Depending of how looks your URLs, you can use this naive pattern that works for your examples and stops before the ?
(written in java style):
\\bhttps?://[^?\\s]+
to ensure there is .ts
at the end, you can change it to:
\\bhttps?://[^?\\s]+\\.ts
or
\\bhttps?://[^?\\s]+\\.ts(?=[\\s?]|\\z)
to check that the end of the path is reached.
Note that these patterns don't deal with URLs that contain spaces between double quotes.
Upvotes: 2