Reputation: 427
This is the code:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Net;
using System.Text.RegularExpressions;
using System.IO;
using unfreez_wrapper;
using Shell32;
namespace DownloadImages
{
public partial class Form1 : Form
{
string f;
string UrlsPath;
int counter;
UnFreezWrapper uf;
string localFilename;
public Form1()
{
InitializeComponent();
uf = new UnFreezWrapper();
counter = 0;
localFilename = @"d:\localpath\";
UrlsPath = @"d:\localpath\Urls\";
using (WebClient client = new WebClient())
{
client.DownloadFile("http://www.sat24.com/foreloop.aspx?type=1&continent=europa#",localFilename + "test.html");
client.DownloadFile("http://www.sat24.com/en/eu?ir=true", localFilename + "test1.html");
}
f = File.ReadAllText(localFilename + "test1.html");
test("image2.ashx", "ir=true");
}
private void test(string firstTag, string lastTag)
{
List<string> imagesUrls = new List<string>();
int startIndex = 0;
int endIndex = 0;
int position = 0;
string startTag = firstTag;//"http://www.niederschlagsradar.de/images.aspx";
string endTag = lastTag;//"cultuur=en-GB&continent=europa";
startIndex = f.IndexOf(startTag);
while (startIndex > 0)
{
endIndex = f.IndexOf(endTag,startIndex);
if (endIndex == -1)
{
break;
}
string t = f.Substring(startIndex, endIndex - startIndex + endTag.Length);
imagesUrls.Add(t);
position = endIndex + endTag.Length;
startIndex = f.IndexOf(startTag,position);
}
string item = imagesUrls[imagesUrls.Count - 1];
imagesUrls.Remove(item);
for (int i = 0; i < imagesUrls.Count; i++)
{
using (WebClient client = new WebClient())
{
client.DownloadFile(imagesUrls[i], UrlsPath + "Image" + counter.ToString("D6"));
}
counter++;
}
List<string> files = Directory.GetFiles(UrlsPath).ToList();
uf.MakeGIF(files, localFilename + "weather", 80, true);
}
First im downloading this html as html file:
http://www.sat24.com/en/eu?ir=true
There there is animation of 9 different images/gifs. I want to download each gif url. So on the hard disk i will get 9 gifs.
When reading the file http://www.sat24.com/en/eu?ir=true in the content inside i see:
var imageUrls = ["/image2.ashx?region=eu&time=201309162345&ir=true","/image2.ashx?region=eu&time=201309162330&ir=true","/image2.ashx?region=eu&time=201309162315&ir=true","/image2.ashx?region=eu&time=201309162300&ir=true","/image2.ashx?region=eu&time=201309162245&ir=true","/image2.ashx?region=eu&time=201309162230&ir=true","/image2.ashx?region=eu&time=201309162215&ir=true","/image2.ashx?region=eu&time=201309162200&ir=true","/image2.ashx?region=eu&time=201309162145&ir=true"];
And inside the List: imagesUrls i see this 9 urls:
For example this is in index 0 : image2.ashx?region=eu&time=201309162345&ir=true I tried without the image2.ashx? But in both cases im getting an error on the line:
client.DownloadFile(imagesUrls[i], UrlsPath + "Image" + counter.ToString("D6"));
ArgumentException Illegal characters in path
Before this when i used the test.html and the two other tags start and end it was working without any problem.
But now im using test1.html and this two tags: test("image2.ashx", "ir=true"); But getting the exception.
When i took one image url for example : image2.ashx?region=eu&time=201309170015&ir=true And tried to surf to it in chrome im getting no rsults it tried to search for it in google .
Its not even a url .
This is the full exception error:
System.ArgumentException was unhandled
HResult=-2147024809
Message=Illegal characters in path.
Source=mscorlib
StackTrace:
at System.IO.Path.CheckInvalidPathChars(String path, Boolean checkAdditional)
at System.Security.Permissions.FileIOPermission.CheckIllegalCharacters(String[] str)
at System.Security.Permissions.FileIOPermission.AddPathList(FileIOPermissionAccess access, AccessControlActions control, String[] pathListOrig, Boolean checkForDuplicates, Boolean needFullPath, Boolean copyPathList)
at System.Security.Permissions.FileIOPermission..ctor(FileIOPermissionAccess access, String[] pathList, Boolean checkForDuplicates, Boolean needFullPath)
at System.IO.Path.GetFullPath(String path)
at System.Net.WebClient.GetUri(String path)
at System.Net.WebClient.DownloadFile(String address, String fileName)
at DownloadImages.Form1.test(String firstTag, String lastTag) in d:\C-Sharp\DownloadImages\DownloadImages\DownloadImages\Form1.cs:line 79
at DownloadImages.Form1..ctor() in d:\C-Sharp\DownloadImages\DownloadImages\DownloadImages\Form1.cs:line 45
at DownloadImages.Program.Main() in d:\C-Sharp\DownloadImages\DownloadImages\DownloadImages\Program.cs:line 19
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException:
Then how can i download the images one by one from this url ? http://www.sat24.com/en/eu?ir=true
When i used the test.html and the startTag ""http://www.niederschlagsradar.de/images.aspx": and endTag: ""cultuur=en-GB&continent=europa"" It worked perfect !
But now with test1.html and the two different tags its not working.
Upvotes: 0
Views: 408
Reputation: 4232
At the point of the exception, what precisely is in imagesUrls[i]
?
Are you saying that it is something like image2.ashx?region=eu&time=201309162345&ir=true
If so, you need to prepend the protocol and server to it, i.e. prepend http://www.sat24.com/
to give a URI of http://www.sat24.com/image2.ashx?region=eu&time=201309162345&ir=true
However, another problem is, you are searching for image2.ashx
for a start tag and then ir=true
as an end tag. Looking at the source of that page, there are numerous image2.ashx
URIs which do not end with ir=true
.
e.g. http://www.sat24.com/image2.ashx?button=af260x160
When you find the start tag in that URI, you're going to get an enormous mass of HTML before you find the end tag.
Upvotes: 1