Reputation: 333
We have a data table with addresses which I am attempting to Geocode.
Then we cycle through the datatable rows to send api requests to Google Geocoding using WebClient.downloadStringAsync(Uri, Object) and doing the described update to the datatable.
After all threads have completed we need to make an update to the database.
For this we are using Task.Factory.StartNew Function and keeping track of them to wait for all tasks to complete.
We are seeing this complete in over 10 minutes for 8000 Addresses.
Is this normal or is there a better approach to this?
Any suggestions are appreciated.
Trimmed down code is below for reference:
DataTable dataTable = new DataTable();
String url = "https://maps.googleapis.com/maps/api/geocode/json?address={0}&key={1}";
List<Task> tasks = new List<Task>();
int i = 0;
foreach (DataRow row in dataTable.Rows) //8000 + rows
{
Uri uriWithAddress = new Uri(String.Format(url, new[] {
"full_address",
"apiKey"
}));
tasks.Add(Task.Factory.StartNew(() => {
using (System.Net.WebClient client = new System.Net.WebClient())
{
client.DownloadStringCompleted += (o, a) =>
{
//when finished... do some work like lock datatable
//and change some values etc
};
client.DownloadStringAsync(uriWithAddress, i);
i++;
}
}));
Task.WaitAll(tasks.ToArray());
Upvotes: 0
Views: 448
Reputation: 3930
A few suggestions:
1) Increase ServicePointManager.DefaultConnectionLimit
which defaults to 2 concurrent connections
2) You may have high thread contention if all results are locking the table. If you dont have memory constraints, consider adding the results to a ConcurrentDictionary
3) Separate the requests to batches to avoid exhausting your open connection pool.
4) Small code comments:
- if using default settings, use Task.Run instead of Task.Factory.StartNew
- i++
has a race condition and may not be accurate. You can use Interlocked.Increment
instead
Upvotes: 1