Reputation: 35
I am writing a program which will look for Mixed Content within a URL. The aim of this script is to extract all links in a page and convert these links to absolute links, and then to see if the content is mixed.
lets say we have this page https://www.example.com/xxx1/ i'm assuming that any reference to links within this page will ALWAYS connect through to the HTTPS site, unless the link is explicitly told otherwise?
E.g
/index.html
= will be HTTPS
http://www.example.com/img/insecureImage.jpg
= Will be HTTP - and therefore insecure?
True?
Thanks,
Upvotes: 0
Views: 125
Reputation: 1894
Yes, independent of mixed content or not, if you see a relative link it is intended to be appended to the origin domain, so in your example /index.html should be interpreted as (https://www.example.com/index.html).
If they are absolute links, determining if its mixed content is exactly like you suggest - check the uri scheme. To reference mixed content, even from the same server, you need to use absolute links, so it makes your task kind of easy.
You're on the right track.
Upvotes: 1
Reputation: 97
The situation with mixed content depends on whether the content is active or passive. If you have an HTTPS site, all active content will be blocked. If it is passive as in the case of the image you provided, it will be displayed by default, but users can choose in their browsers to block this too.
The example you give is of an image file, so that is passive mixed content and that would not be blocked by default, but could be by the user's settings as mentioned.
The following resources fit into that class:
The guide I link to explains the active/passive mixed content quite well.
Upvotes: 1