Reputation: 2375
I have a Twitter shortened URL (t.co) and I'm trying to use jsoup to send a request and parse its response. There should be three redirect hops before reaching the final URL. This is not the case when using jsoup, even after setting followRedirects
to true
.
My code:
public static void main(String[] args) {
try {
Response response = Jsoup.connect("https://t. co/sLMy6zi4Yw").followRedirects(true).execute(); // Space intentional to avoid SOF shortened errors
System.out.println(response.statusCode()); // prints 200
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
However, using Python's Request library, I can get the right response:
response = requests.get('https://t. co/sLMy6zi4Yw', allow_redirects=False)
print(response.status_code)
301
I'm using jsoup version 1.11.2 and Requests version 2.18.4 with Python 3.5.2.
Anybody have any insight on the matter?
Upvotes: 2
Views: 750
Reputation: 3055
To overcome this special case you can remove the User-Agent header which Jsoup sets by default (for some unknown/undocument reason)
Connection connection = Jsoup.connect(url).followRedirects(true);
connection.request().removeHeader("User-Agent");
Let's examine the raw requests & view the server behavior
Request with user agent (to simulate a browser) returns
Curl example
curl --include --raw "https://t. co/sLMy6zi4Yw" --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
Response
Chrome/63.0.3239.132 Safari/537.36"
HTTP/1.1 200 OK
cache-control: private,max-age=300
content-length: 257
content-security-policy: referrer always;
content-type: text/html; charset=utf-8
referrer-policy: unsafe-url
server: tsa_b
strict-transport-security: max-age=0
vary: Origin
x-response-time: 20
x-xss-protection: 1; mode=block; report=https://twitter.com/i/xss_report
<head><meta name="referrer" content="always"><noscript><META http-equiv="refresh" content="0;URL=http://bit. ly/2n3VDpo"></noscript><title>http://bit. ly/2n3VDpo</title></head><script>window.opener = null;location.replace("http:\/\/bit. ly\/2n3VDpo")</script>
Request without user agent returns
Curl example
curl --include --raw "https://t. co/sLMy6zi4Yw"
HTTP/1.1 301 Moved Permanently
cache-control: private,max-age=300
content-length: 0
location: http://bit. ly/2n3VDpo
server: tsa_b
strict-transport-security: max-age=0
vary: Origin
x-response-time: 9
Upvotes: 2