Reputation: 43
We have an object (XML or JSON) and we map it to a DTO successfully, it takes too long (5~7 minutes) to be inserted in our database, so we went through Parallel.ForEach
, but eventually, we noticed that there are some data entered incorrectly, like the Category
has all items with the same name, but other different properties are 100% correct, in other case, we noticed that all data are the same in one category, although, the provided JSON object doesn't have that.
I confess it is so fast, it takes less than a minute, but with wrong insertion, have a look below on the used code:
JSON
[
{
"CategoryId": 1,
"CategoryName": "Drinks",
"SortOrder": 1,
"Products": [
{
"ProductId": 100,
"ProductName": "Black Tea",
"SortOrder": 1,
"Price": 5,
"Choices": []
},
{
"ProductId": 101,
"ProductName": "Turkish Coffee",
"SortOrder": 2,
"Price": 7.5,
"Choices": []
},
{
"ProductId": 102,
"ProductName": "Green Tea",
"SortOrder": 3,
"Price": 6,
"Choices": []
},
{
"ProductId": 103,
"ProductName": "Café Latte Medium",
"SortOrder": 4,
"Price": 10,
"Choices": []
},
{
"ProductId": 104,
"ProductName": "Orange Juice",
"SortOrder": 5,
"Price": 11,
"Choices": []
},
{
"ProductId": 105,
"ProductName": "Mixed Berry Juice",
"SortOrder": 6,
"Price": 12.5,
"Choices": []
}
]
},
{
"CategoryId": 1,
"CategoryName": "Meals",
"SortOrder": 1,
"Products": [
{
"ProductId": 200,
"ProductName": "Breakfast Meal",
"SortOrder": 1,
"Price": 16,
"Choices": [
{
"ChoiceId": 3000,
"ChoiceName": "Strawberry Jam",
"SortOrder": 1,
"Price": 0
},
{
"ChoiceId": 3001,
"ChoiceName": "Apricot Jam",
"SortOrder": 2,
"Price": 0
},
{
"ChoiceId": 3002,
"ChoiceName": "Orange Jam",
"SortOrder": 3,
"Price": 0
},
{
"ChoiceId": 3003,
"ChoiceName": "Café Latte",
"SortOrder": 4,
"Price": 2
}
]
},
{
"ProductId": 201,
"ProductName": "Mixed Grill",
"SortOrder": 1,
"Price": 30,
"Choices": [
{
"ChoiceId": 3004,
"ChoiceName": "Moutabal",
"SortOrder": 1,
"Price": 0
},
{
"ChoiceId": 3005,
"ChoiceName": "Mineral Water",
"SortOrder": 2,
"Price": 0
},
{
"ChoiceId": 3006,
"ChoiceName": "French Fries",
"SortOrder": 2,
"Price": 0
},
{
"ChoiceId": 3007,
"ChoiceName": "Grilled Potatoes",
"SortOrder": 2,
"Price": 0
}
]
}
]
}
]
C# code
Parallel.ForEach(categories, (category) =>
{
var newCreatedCategoryId = 0;
using (var connection = new SqlConnection("CONNECTION_STRING_HERE"))
{
connection.Open();
using (var command = new SqlCommand("SP_INSERT_INTO_CATEGORIES", connection))
{
command.CommandType = CommandType.StoredProcedure;
command.Parameters.AddWithValue("@P1", category.CategoryName);
command.Parameters.AddWithValue("@P2", category.SortOrder);
newCreatedCategoryId = int.Parse(command.ExecuteScalar().ToString());
command.Dispose();
}
connection.Close();
}
if (newCreatedCategoryId > 0)
{
Parallel.ForEach(category.Products, (product) =>
{
using (var connection = new SqlConnection("CONNECTION_STRING_HERE"))
{
connection.Open();
using (var command = new SqlCommand("SP_INSERT_INTO_PRODUCTS", connection))
{
command.CommandType = CommandType.StoredProcedure;
command.Parameters.AddWithValue("@P1", product.ProductName);
command.Parameters.AddWithValue("@P2", product.Price);
command.Parameters.AddWithValue("@P3", product.SortOrder);
command.Parameters.AddWithValue("@P4", newCreatedCategoryId);
command.ExecuteNonQuery();
command.Dispose();
}
connection.Close();
}
});
}
});
I had a look here, but this is not our issue, we are already using SCOPE_IDENTITY()
to get the last generated identity in the current scope of execution.
On the other hand, it is not allowed to use SqlBulkCopy
to insert this amount of data even if with no TableLock
.
Upvotes: 1
Views: 2579
Reputation: 81583
Its the newCreatedCategoryId
that is the problem, what is confusing me is why you are calling newCreatedCategoryId = int.Parse(command.ExecuteScalar().ToString());
again in the inner loop. i mean if its just an id of category it doesn't need to be incremented again.
Take a look at the below edit. You might also be better to just put the second Parallel.ForEach
into a standard foreach
i mean this is all working in parallel anyway. Lastly Parallel.ForEach is not really suited to IO tasks, the correct pattern is async and await. on saying that you could probably use an ActionBlock out of TPL Dataflow to take advantage of the best of both worlds. Take a look at the dataflow example in the this question i answered Downloading 1,000+ files fast?
Parallel.ForEach(categories, (category) =>
{
var newCreatedCategoryId = 0;
using (var connection = new SqlConnection("CONNECTION_STRING_HERE"))
{
connection.Open();
using (var command = new SqlCommand("SP_INSERT_INTO_CATEGORIES", connection))
{
command.CommandType = CommandType.StoredProcedure;
command.Parameters.AddWithValue("@P1", category.CategoryName);
command.Parameters.AddWithValue("@P2", category.SortOrder);
newCreatedCategoryId = int.Parse(command.ExecuteScalar().ToString());
command.Dispose();
}
connection.Close();
}
if (newCreatedCategoryId > 0)
{
foreach(product in category.Products)
{
using (var connection = new SqlConnection("CONNECTION_STRING_HERE"))
{
connection.Open();
using (var command = new SqlCommand("SP_INSERT_INTO_PRODUCTS", connection))
{
command.CommandType = CommandType.StoredProcedure;
command.Parameters.AddWithValue("@P1", product.ProductName);
command.Parameters.AddWithValue("@P2", product.Price);
command.Parameters.AddWithValue("@P3", product.SortOrder);
command.Parameters.AddWithValue("@P4", newCreatedCategoryId);
command.Dispose();
}
connection.Close();
}
}//);
}
});
Upvotes: 5
Reputation: 633
The objects that you are looping over are not thread-safe. You could add a lock object however this would serialise the operation and defeat the purpose of the Parallel.Foreach. You need to change the
Parallel.ForEach to a standard ForEach loop.
Potential Pitfalls in Data and Task Parallelism
Upvotes: 1
Reputation: 1066
You change the newCreatedCategoryId inside the Parallel.ForEach, which may cause incorrect data, because the queries wont run in order.
Upvotes: 0