Reputation: 453
I have a strange problem with Curl on Ubuntu 16.04. Each night i Curl a large file from a remote API endpoint and save it to a folder, i use the curl command inside a bash script, i have noticed that my Curl command fails to save the output most of the time, so i created a script that checks if the file has saved, and if not it attempts to download again.
Each night i check the logs and i can see see that most of the time the curl fails at-least 5 times before saving the file:
file is empty - retrying download
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 4768 0 4768 0 0 47 0 --:--:-- 0:01:40 --:--:-- 1236
file is empty - retrying download
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 4768 0 4768 0 0 47 0 --:--:-- 0:01:40 --:--:-- 1266
file is empty - retrying download
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
It always stops after 1 minute and 40 seconds, i suspect this is when Curl gives up.
The curl command i am using is:
curl -s "https://*.com/api/v2.0/*.*./?apitoken=123" > file.txt
The strange thing is when i run curl without the apostrophes round the URL and don't save it to a file:
curl -s https://*.com/api/v2.0/*.*./?apitoken=123
I see the output straight away
When i run it without apostrophes and pipe into a file:
curl -s "https://*.com/api/v2.0/*.*./?apitoken=123 > file.txt
I see the file downloads, but the command output does not go into file.txt it just appears in the shell.
If i use -o with Curl to specify the output i get the same timeout issue as above and it often fails multiple times before working.
All my other curl commands work so i suspect this is an issue because its a big file.
When i run Curl with -v i can see that after it fails i receive a 524 error from cloudflare: HTTP/1.1 524 Origin Time-out
What i don't understand is - Why my original command fails 90% of the time before eventually suceeding - Why after removing the apostrophes from the URL it downloads straight away - Why without the apostrophes it doesn't save to a file with " curl > file"
Can anybody shed light on this and show a proper method of curling a large file? (preferably piping the output to a file rather then using -o)
Upvotes: 4
Views: 11652
Reputation: 3441
HTTP error 524 means that the server was able to complete a TCP connection to the server, but didn't receive a HTTP response in time. I think the server takes too much time to retrieve the file or something alike. Once it's requested several times, the server loads it somewhere for 'quick access', which is why not every command fails.
You could try to add the --connect-timeout
option to give the server more time to response (5 minutes I would suggest).
--connect-timeout <seconds>
Maximum time in seconds that you allow the connection to the server to take. This >only limits the connection phase, once curl has connected this option is of no more >use. See also the -m/--max-time option. If this option is used several times, the last one will be used.
If the file indeed is too large, you could go one and add the -m
option to extend the time you're allowed to download (time depends on your connection speed).
-m/--max-time <seconds>
Maximum time in seconds that you allow the whole operation to take. This is useful for preventing your batch jobs from hanging for hours due to slow networks or links going down. See also the --connect-timeout option.If this option is used several times, the last one will be used.
If everything still goes wrong, try to add the -D
option and change the -s
option to -v
.
-D/--dump-header Write the protocol headers to the specified file.
This option is handy to use when you want to store the headers that a HTTP site sends to you. Cookies from the headers could then be read in a second curl invocation by using the -b/--cookie option! The -c/--cookie-jar option is however a better way to store cookies.
When used in FTP, the FTP server response lines are considered being "headers" and thus are saved there.
If this option is used several times, the last one will be used.
-v/--verbose Makes the fetching more verbose/talkative. Mostly useful for debugging. A line starting with '>' means "header data" sent by curl, '<' means "header data" received by curl that is hidden in normal cases, and a line starting with '*' means additional info provided by curl.
Note that if you only want HTTP headers in the output, -i/--include might be the option you're looking for.
If you think this option still doesn't give you enough details, consider using --trace or --trace-ascii instead.
This option overrides previous uses of --trace-ascii or --trace.
Use -s/--silent to make curl quiet.
Upvotes: 2