Reputation: 106
I have to write a program which checks if a website has mixed content. But I'm not sure how to identify mixed content Is there an exact definition of mixed content ?
I know that mixed content can be of following types:
<img src="$unsafeContent">
<script src="$unsafeContent">
<object data="$unsafeContent">
<audio src="$unsafeContent">
<video src="$unsafeContent">
<form action="$unsafeContent">
<iframe src="$unsafeContent">
<embed src="$unsafeContent">
<source src="$unsafeContent">
<param value="$unsafeContent">
<a href="$unsafeContent">
But what about combined strings in a javascript ? I can't recognize them easily. Do I have to download them and check their content too ? The same problem applies to css-files. And what about iFrames or anchors ? Do I have to check the page of them too, or only the url of the destination ?
Upvotes: 0
Views: 2033
Reputation: 106
Thank you for your hints. Now I've got a good solution. Use the npm-module "chrome-remote-interface". With this you can get information about mixed content from a headless Chrome over the Debugging-Api as described here https://chromedevtools.github.io/devtools-protocol/tot/Security/
Also there is a solution for Java https://github.com/webfolderio/cdp4j. (Don't forget to check the license if you choose this solution)
Upvotes: 0
Reputation: 742
Definition of Mixed content
Mixed content occurs when initial HTML is loaded over a secure HTTPS connection, but other resources (such as images, videos, stylesheets, scripts) are loaded over an insecure HTTP connection. This is called mixed content because both HTTP and HTTPS content are being loaded to display the same page, and the initial request was secure over HTTPS.
Mixed content degrades the security and user experience of your HTTPS site.
How to detect mixed content
Modern browsers display warnings about this type of content to indicate to the user that this page contains insecure resources. So this means that it could be a good way to verify that your detection program is working correctly.
For example, on Chrome DevTools (F12), Network Tab, you are going to see a status of (blocked:mixed-content) for a request that is of unsafe content.
Detecting mixing content basically is detecting content that is loaded not using the HTTPS protocol, so checking for the tags that you mention is pretty easy, you can just run a regex or xpath to accomplish this. But the hard part is detecting dynamic loaded content (i.e. XMLHttpRequest calls). So, in this case you must actually wait for the javascript on the page to run. A tool like Selenium Web Driver http://www.seleniumhq.org/projects/webdriver/ that allows you to do browser automation in any of Java, C#, Ruby, Python, Javsascript languages could do the job.
As Detect broken SSL or insecure content warning with Selenium, BrowserStack, & Node.js suggests, a very simple script that just checks the Firefox WebDriver logs is an easy solution.
Upvotes: 2
Reputation: 100
You should be able to run Chromium in headless mode (no graphics) and enable some debugging to see all the URLs that the side (the browser actually) is gathering.
Once you have the URL list, apply your rules to what is safe or not.
https://chromium.googlesource.com/chromium/src/+/lkgr/headless/README.md
Upvotes: 0