Reputation: 539
Ok, I will try to explain my problem as best as I can, from the code snippet below I’m passing tempURLTestedVsCaptured, to my method CheckForDuplicates_capturedUrls.
This method will check if there are any duplicates, if there are not, it will add the URL (contained within my URLs object) to a new URLs object. Then once done, it will set the original tempURLs to the reference of the new object.
The problem I have tempURLTestedVsCaptured is not getting the new reference. If I watch the tempURLs, it has the correct value at the end of the method, when it jumps out back to the Crawl method, the tempURLTestedVsCaptured has returned to the original value.
If I change tempURLs, for example adding a URL to it, the changes are reflected.
If I do:
tempURLs = new URLs();
tempURLs = processedURLs;
It won’t pick up the change. I’m clearly missing something very fundermental here in my learning, but I can’t put my finger on it.
private void CheckForDuplicates_capturedUrls(URLs tempURLs)
{
URLs unprocessedURLs = (URLs)tempURLs;
URLs processedURLs = new URLs();
foreach (URL url in unprocessedURLs)
{
if (!crawlContext.capturedUrls.ContainsURL(url))
{
processedURLs.AddURL(url);
}
}
tempURLs = new URLs();
tempURLs = processedURLs;
}
private void Crawl(WebScraper_Context crawlContext)
{
URLs tempURLTestedVsVisited = new URLs();
URLs tempURLTestedVsCaptured = new URLs();
while (crawlContext.unVistedURLs.Count() != 0) //While we have URLS we not visited, continue
{
foreach (URL url in crawlContext.unVistedURLs)
{
// If we not visted the page yet
if (!crawlContext.vistedURLs.ContainsURL(url)) // Vist the URL if there is one
{
crawlContext.vistedURLs.AddURL(url);
LoadPage(url.url);
doc = GetSubSetXPath(doc, crawlContext.xPath);
}
if (doc != null)
{
crawlContext.scrapedUrls = ScrapeURLS();
crawlContext.scrapedUrls = GetLocalUrls(crawlContext.scrapedUrls);
// Cache the URLS into, so we can check if we seen them before
foreach (URL newURL in crawlContext.scrapedUrls)
{
if (!tempURLTestedVsVisited.ContainsURL(newURL))
{
tempURLTestedVsVisited.AddURL(newURL);
tempURLTestedVsCaptured.AddURL(newURL);
}
else
{
System.Windows.Forms.MessageBox.Show("Duplicate URL found in scraped URLS");
}
}
**this.CheckForDuplicates_capturedUrls(tempURLTestedVsCaptured);**
foreach (URL newURL in crawlContext.scrapedUrls)
{
if (tempURLTestedVsVisited.ContainsURL(newURL) && tempURLTestedVsCaptured.ContainsURL(newURL))
{
crawlContext.newURLs.AddURL(newURL);
crawlContext.capturedUrls.AddURL(newURL);
}
}
}
}
crawlContext.unVistedURLs = new URLs(); crawlContext.unVistedURLs = crawlContext.newURLs;
crawlContext.newURLs = new URLs();
}
if (RequestStop == true)
{
RequestStop = false;
}
System.Windows.Forms.MessageBox.Show("Complete");
}
Ok T. Kiley completely explains my problem and why I’m getting it. The reason I’m not returning URLS, and I’m doing a pointless cast, is because the method signature is planned to be:
private void CheckForDuplicates_capturedUrls(object tempURLs).
The method is going to be used as a thread start “DuplicateCheckerB = new Thread(this.CheckForDuplicates_capturedUrls);” and “DuplicateCheckerA.Start(tempURLTestedVsVisited);” I originally thought my problem was down to threading, so I stripped it in in process of debugging.
Now, would I be right in thinking that I have to modify the actual object to remove the URLs if I am going to pass it to thread?
Upvotes: 0
Views: 95
Reputation: 33738
You are not passing your method parameter by reference:
private void CheckForDuplicates_capturedUrls(ref URLs tempURLs) {...}
and call it thusly:
CheckForDuplicates_capturedUrls(ref tempURLTestedVsCaptured);
Alternatively, and preferably, just return a new list:
private URLs CheckForDuplicates_capturedUrls(URLs tempURLs) {
URLs result = new URLs();
// process tempURLs, storing the result
return result;
}
and use it like so:
tempURLTestedVsCaptured = CheckForDuplicates_capturedUrls(tempURLTestedVsCaptured);
Upvotes: 3
Reputation: 415690
You could fix this with a reference parameter, but a better design is to return your new list, following this pattern:
tempUrls = CheckForDuplicates_capturedUrls(URLs tempURLs)
.
private URLs CheckForDuplicates_capturedUrls(URLs tempURLs)
{
//It's not clear what options your URLs type has, but based on
// your example use, it looks like you might be inheriting List<url>
// or implementing IList. This code makes that assumption
return tempURLs.Where(url => !crawlContext.capturedUrls.ContainsURL(url)).ToList();
}
We even get the code down to a one-liner :)
The original code did not work because .Net passes references to functions by value. This means the function gets a reference that refers to the same object, but is still just a copy. The function can make changes to properties in an object, or call it's methods, and those changes will be evident to the caller of the function. However, if the function tries to make an assignment directly to the variable itself, it will only change the copy.
Upvotes: 1
Reputation: 38468
In C# parameters are by passed by value. That means you get a copy of the reference in the method you are calling. Any changes you make to that copy won't effect the other reference which lives outside of that method call. By using ref
keyword for the parameter you can make changes to the original reference but don't do that. Returning new value from the method is the better way.
private URLs CheckForDuplicates_capturedUrls(URLs tempURLs) {...}
Upvotes: 1
Reputation: 2802
There are a few things wrong with your code, but taking your issue with your first method you first need to understand the difference between pass by reference and pass by value. This concept is more confusing in languages like C# and Java (than C++ [imo]) since it looks like it is passing by reference when in fact, it (almost) always passes by value.
Basically, when a function passes by value, the value of the parameter is copied to the function. Clearly, if you do this, any modification made will be lost.
Pass by reference, on the other hand, simply tells the function where to look, so modifications will be kept.
C# is pass by value, but the value you are passing is a pointer
What this means in practise is, you pass in your pointer to your tempUrls. Were you to modify the thing this pointer is pointing at, you would modify the thing you passed in.
However, what you do in your method:
tempURLs = processedURLs;
Points tempUrls to a new thing. We passed the address in by value, so changing the address won't do anything. The caller is still looking in the old location.
You can get around this using the ref
keyword but in this method it would probably be better to return the new tempUrls
Other issues Your code as a few other problems:
new
if you actually want a new objectUpvotes: 2
Reputation: 31
In C# objects are passed by reference. However when you provide a method an object it makes a copy of the pointer (a reference is a safe pointer) then provides it to the function.
Add 'ref' to the method call and you will pass the original reference.
private void CheckForDuplicates_capturedUrls(ref URLs tempURLs)
Upvotes: -3