Jenix
Jenix

Reputation: 3076

Downloading Large Google Drive files with WebClient in C#

I know there are tones of questions on this subject already. After reading all the threads, I decided to get a redirected URL in a confirmation HTML page and then use it as a direct link to download.

As you know, the original URL format of the direct download link is like this.

https://drive.google.com/uc?export=download&id=XXXXX..

But if the size of the target file is big, then it is like this.

https://drive.google.com/uc?export=download&confirm=RRRR&id=XXXXX..

I can get RRRR from the first downloaded data, so I need to try twice in order to download the real file. The concept is very simple enough but I can't get this to work.

class Test
{
    class MyWebClient: WebClient
    {
        CookieContainer c = new CookieContainer();

        protected override WebRequest GetWebRequest(Uri u)
        {
            var r = (HttpWebRequest) base.GetWebRequest(u);
            r.CookieContainer = c;
            return r;
        }
    }

    static string GetRealURL(string filename)
    {
        // Some Jobs to Parse....
        return directLink;
    }

    static void Main()
    {
        MyWebClient wc = new MyWebClient();

        string targetLink = "https://drive.google.com/uc?export=download&id=XXXXXXX";
        wc.DownloadFile(targetLink, "tempFile.tmp");

        targetLink = GetRealURL("tempFile.tmp");
        wc.DownloadFile(targetLink, "realFile.dat");
    }
}

What did I wrong? I can get the right download link from the first file, but I get another confirmation page file with another confirm code on the second try. I thought this was because of cookies, so I created my own WebClient class as you can see above.

Also I originally used DownloadFileAsync(), and changed it to DownloadFile() just in case, but the same result.. I'm still thinking it has something to do with cookie things.

What am I missing here?

Upvotes: 2

Views: 2317

Answers (1)

Willster419
Willster419

Reputation: 122

I had this same problem but had solved it in an HttpClient. I tried via your approach with WebClient and was able to get it to work. You don't show your GetRealUrl() source, but i'm willing to bet in there lies the issue. Here's how I did it:

  • You need to parse the html response to get the url in the href attribute of the "download anyway" button. It will only have the relative url, (the /uc?export=download... part)
  • You need to replace the xml escape character & with &
  • Then you can build the url using the domain https://drive.google.com

At which point you can download the file. Here's the source (used in a test WPF application):

class MyWebClient : WebClient
{
    CookieContainer c = new CookieContainer();

    protected override WebRequest GetWebRequest(Uri u)
    {
        var r = (HttpWebRequest)base.GetWebRequest(u);
        r.CookieContainer = c;
        return r;
    }
}

private async void WebClientTestButtonGdrive_Click(object sender, RoutedEventArgs e)
{
    using (MyWebClient client = new MyWebClient())
    {
        //get the warning page
        string htmlPage = await client.DownloadStringTaskAsync("https://drive.google.com/uc?id=XXXXXXX&export=download");

        //use HtmlAgilityPack to get the url with the confirm parameter in the url
        HtmlDocument document = new HtmlDocument();
        document.LoadHtml(htmlPage);
        HtmlNode node = document.DocumentNode;
        HtmlNode urlNode = node.SelectSingleNode(@"//a[contains(@href, 'XXXXXXX') and contains(@id, 'uc-download-link')]//@href");
        string downloadUrl = urlNode.Attributes["href"].Value;
        downloadUrl = downloadUrl.Replace("&", "&");
        downloadUrl = "https://drive.google.com" + downloadUrl;

        //download the file
        if (File.Exists("FileToDownload.zip"))
            File.Delete("FileToDownload.zip");
        await client.DownloadFileTaskAsync(downloadUrl, "FileToDownload.zip");
    }
}

Upvotes: 1

Related Questions