Reputation: 694
I am concerned about the safety of fetching content from unknown url in PHP.
We will basically use cURL to fetch html content from user provided url and look for Open Graph meta tags, to show the links as content cards.
Because the url is provided by the user, I am worried about the possibility of getting malicious code in the process.
I have another question: does curl_exec actually download the full file to the server? If yes then is it possible that viruses or malware be downloaded when using curl?
Upvotes: 3
Views: 1338
Reputation: 2819
Expanding on the answer made by Ray Radin.
He is correct that if you use sound a sound process to search the fetched resource there should be no problem in fetching whatever url is provided. Some examples here are:
Even though there is no foolprof way of validating what you are requesting with a specific url. There are ways you can make your life easier and prevent some potential issues.
For example a url might point to a large binary, large image file or something similar.
Make a HEAD
request first to get the header information. Then look at the Content-type
and Content-length
headers to see if the content is a plain text html file
You should however not trust these since they can be spoofed. Doing this will hovewer make sure that even non-malicous content won't crash your script. Requesting image files is presumably something you don't want users to do.
I recommend using Guzzle to do your request since it is in my opinion provides some functionallity that should make this easier
Upvotes: 1
Reputation: 167
It is safe but you will need to do a proper data check before using it. As you should with any data input anyway.
Upvotes: 0
Reputation: 21
you can use httpclient.class instead of file_get_content or curl. because it connect's the page through the socket.After download the data you can take the meta data using preg_match.
Upvotes: 1
Reputation: 2488
Short answer is file_get_contents
is safe you retrieve data, even curl is. It is up to you what you do with that data.
Few Guidelines:
1. Never Run eval
on that data.
2. Don't save it to database without filtering.
3. Don't even use file_get_contents
or curl
.
Use: get_meta_tags
array get_meta_tags ( string $filename [, bool $use_include_path = false ] )
// Example
$tags = get_meta_tags('http://www.example.com/');
You will have all meta tags parsed, filtered in an array.
Upvotes: 1
Reputation: 6363
Using cURL is similar to using fopen()
and fread()
to fetch content from a file.
Safe or not, depends on what you're doing with the fetched content.
From your description, your server works as some kind of intermediary that extracts specific subcontent from a fetched HTML content. Even if the fetched content contains malicious code, your server never executes it, so no harm will come to your server.
Additionally, because your server only extracts specific subcontent (Open Graph meta tags, as you say), everything else that is not what you're looking for in the fetched content is ignored, which means your users are automatically protected.
Thus, in my opinion, there is no need to worry. Of course, this relies on the assumption that the content extraction process is sound. Someone should take a look at it and confirm it.
does curl_exec actually download the full file to the server?
It depends on what you mean by "full file". If you mean "the entire HTML content", then yes. If you mean "including all the CSS and JS files that the feched HTML content may refer to", then no.
is it possible that viruses or malware be downloaded when using curl?
The answer is yes. The fetched HTML content may contain malicious code, however, if you don't execute it, no harm will come to you.
Again, I'm assuming that your content extraction process is sound.
Upvotes: 8