Crolog Mark
Crolog Mark

Reputation: 41

How to download a page source in batch

I was wondering how I would download the XML source of any webpage in batch.

Say I was downloading view-source:https://www.google.com/, how would i get the text and save it as a TXT file on my computer?

The less calling other languages the better, I need to have it at least be batch or powershell.

EDIT: To clarify, I am not filtering anything out here, I just want the raw XML.

Upvotes: 0

Views: 3043

Answers (2)

Reino
Reino

Reputation: 3423

From the command-line you can use ...

curl.exe -s -o "output.txt" https://www.google.com/
curl.exe -s https://www.google.com/ > "output.txt"

...or

xidel.exe -s https://www.google.com/ --download "output.txt"
xidel.exe -s https://www.google.com/ -e "$raw" > "output.txt"

Upvotes: 2

Worthwelle
Worthwelle

Reputation: 1271

PowerShell 2.0+

In PowerShell 2.0+, you can run the following code to download a website's HTML/XML to a file:

$webclient = new-object system.net.webclient;
$webclient.DownloadString('https://www.google.com/') | Set-Content -Path .\file.txt

You can reduce this to one line as:

(new-object system.net.webclient).DownloadString('https://www.google.com/') | Set-Content -Path .\file.txt

which can be run from the command line as:

powershell.exe -executionpolicy --command "(new-object system.net.webclient).DownloadString('https://www.google.com/') | Set-Content -Path .\file.txt"

PowerShell 3.0+

In PowerShell 3.0+, you can run the following code to download a website's HTML/XML to a file (as suggested by Squashman):

$R = Invoke-WebRequest -URI https://www.google.com/
$R.Content | Set-Content -Path .\file.txt

You can reduce this to one line as:

(Invoke-WebRequest -URI https://www.google.com/).Content | Set-Content -Path .\file.txt

which can be run from the command line as:

powershell.exe -executionpolicy --command "(Invoke-WebRequest -URI https://www.google.com/).Content | Set-Content -Path .\filer.txt"

In most cases, you'll also need to add code to handle line endings, which are often only \n. Many Windows text editors (like Notepad), will not display those, so it would make sense to replace them with \r\n.

Upvotes: 3

Related Questions