Yassin Hajaj
Yassin Hajaj

Reputation: 21965

URL in Java : Why does the String part after "+" not be considered?

I'm working with URL's and more precisely on Stack Overflow.

The structure of the questions part of the site's URLs is :

/questions/tagged/tag+anotherTag+lastTag

When trying to work with the URL, I just get the questions for the first Tag.

Example

URL url = null;
InputStream is = null;
BufferedReader br;
String line;

try{
    url = new URL("https://stackoverflow.com/questions/tagged/cobol+hibernate");
    br = new BufferedReader(new InputStreamReader(url.openStream()));

    while ((line = br.readLine()) != null) {
        if (line.contains("<div class=\"tags")){
            System.out.println(line);
        }
    }
} catch (Exception e){
    e.printStackTrace();
}
System.out.println(url);

Output

<div class="tags t-cobol">
<div class="tags t-batch-file t-cobol t-mainframe t-vsam">
<div class="tags t-cobol t-mainframe">
<div class="tags t-cobol t-opencobol t-microfocus">
<div class="tags t-cobol">
https://stackoverflow.com/questions/tagged/cobol+hibernate

Expected Output

// Nothing because there is no question under both tags
https://stackoverflow.com/questions/tagged/cobol+hibernate

Actual Link is an empty page (in the way that never any question has been posted with both tags together) and as you can see, the code just looks for questions identified with the first tag.


Cobol+Hibernate is just an example that explains the problem very well, I know there is no logic to put these two tags together.

Upvotes: 2

Views: 109

Answers (1)

janos
janos

Reputation: 124648

This curl command and output sheds some light:

$ curl 'http://stackoverflow.com/questions/tagged/cobol+hibernate'
<html><head><title>Object moved</title></head><body>
<h2>Object moved to <a href="/questions/tagged/cobol">here</a>.</h2>
</body></html>

That is, the request is redirected, dropping the second tag.

Also an extract from the output of curl -v ...:

< HTTP/1.1 302 Found
< Cache-Control: private
< Content-Type: text/html; charset=utf-8
< Location: /questions/tagged/cobol

It would appear that you need some rep to be able to search for multiple tags at the same time. If I open http://stackoverflow.com/questions/tagged/cobol+hibernate in an incognito window (where I'm not logged in), the 2nd and further tags are dropped.

So if you want to do this query in Java, it would appear that you need to login programmatically.

I guess this is because searching for multiple tags can be a burden on the database, and so its use is restricted to experienced users. You can probably get a definitive answer on MSE.

Upvotes: 2

Related Questions