Reputation: 41
I was wondering how I would download the XML source of any webpage in batch.
Say I was downloading view-source:https://www.google.com/
, how would i get the text and save it as a TXT file on my computer?
The less calling other languages the better, I need to have it at least be batch or powershell.
EDIT: To clarify, I am not filtering anything out here, I just want the raw XML.
Upvotes: 0
Views: 3043
Reputation: 3423
From the command-line you can use curl...
curl.exe -s -o "output.txt" https://www.google.com/
curl.exe -s https://www.google.com/ > "output.txt"
...or xidel
xidel.exe -s https://www.google.com/ --download "output.txt"
xidel.exe -s https://www.google.com/ -e "$raw" > "output.txt"
Upvotes: 2
Reputation: 1271
In PowerShell 2.0+, you can run the following code to download a website's HTML/XML to a file:
$webclient = new-object system.net.webclient;
$webclient.DownloadString('https://www.google.com/') | Set-Content -Path .\file.txt
You can reduce this to one line as:
(new-object system.net.webclient).DownloadString('https://www.google.com/') | Set-Content -Path .\file.txt
which can be run from the command line as:
powershell.exe -executionpolicy --command "(new-object system.net.webclient).DownloadString('https://www.google.com/') | Set-Content -Path .\file.txt"
In PowerShell 3.0+, you can run the following code to download a website's HTML/XML to a file (as suggested by Squashman):
$R = Invoke-WebRequest -URI https://www.google.com/
$R.Content | Set-Content -Path .\file.txt
You can reduce this to one line as:
(Invoke-WebRequest -URI https://www.google.com/).Content | Set-Content -Path .\file.txt
which can be run from the command line as:
powershell.exe -executionpolicy --command "(Invoke-WebRequest -URI https://www.google.com/).Content | Set-Content -Path .\filer.txt"
In most cases, you'll also need to add code to handle line endings, which are often only \n
. Many Windows text editors (like Notepad), will not display those, so it would make sense to replace them with \r\n
.
Upvotes: 3