Reputation: 5957
My Console app is reading huge volume of data from text files and those will be saved to a DB. For this purpose, I am storing the data into a DataTable and I want to dump this DataTable to a DB every 5 minutes (If I want to dump the whole data at once, then I have to fill the DataTable with whole set of data , and in that case I am getting OutOfMemoryException).
public void ProcessData()
{
string[] files=File.ReadAllLines(path)
foreach(var item in files)
{
DataRow dtRow= dataTable.NewRow();
dtRow["ID"]= .... //some code here;
dtRow["Name"]= .... //some code here;
dtRow["Age"]= .... //some code here;
var timer = new Timer(v => SaveData(), null, 0, 5*60*1000);
}
}
public void SaveData(string tableName, DataTable dataTable )
{
//Some code Here
//After dumping data to DB, clear DataTable
dataTable.Rows.Clear();
}
What I wanted here is, the code will continue to fill the DataTable, and every 5 minute it will call SaveData() method. This will continue to run till all files has processed.
However, I have seen that, when the SaveData() method is called , it is executing for 4-5 times. Sometimes, it has bot called in every 5 minute.
I am not getting how to proceed here. How to fix this ? Can any other approach be used here ? Any help is appreciated.
Upvotes: 1
Views: 3190
Reputation: 333
Here would be a suggestion on how to implement the code and the suggestion from the other answer:
public void ProcessData()
{
int i = 1;
foreach(var item in File.ReadLines(path)) //This line has been edited
{
DataRow dtRow= dataTable.NewRow();
dtRow["ID"]= .... //some code here;
dtRow["Name"]= .... //some code here;
dtRow["Age"]= .... //some code here;
if (i%25 == 0) //you can change the 25 here to something else
{
SaveData(/* table name */, /* dataTable */);
}
i++;
}
SaveData(/* table name */, /* dataTable */);
}
public void SaveData(string tableName, DataTable dataTable )
{
//Some code Here
//After dumping data to DB, clear DataTable
dataTable.Rows.Clear();
}
Upvotes: 2
Reputation: 4059
Your biggest problem is instantiating new Timer
instances in your foreach. New Timer
objects in every foreach call mean multiple threads calling SaveData
concurrently, meaning dataTable
being processed and saved to the database multiple times concurrently, possibly (and likely) before rows are cleared, thus duplicating much of your file into the database.
Before I provide a solution to the question as asked, I wanted to point out that saving data in a 5 minute interval has a distinct code smell to it. As has been pointed out, I would suggest some approach that loads and saves data based on some data size rather than an arbitrary time interval. That said, I will go ahead and address your question on the assumption that there is a reason you must go with 5 minute interval save.
First, we need to setup our Timer
correctly, which you'll notice I create outside of the foreach loop. Timer
continues running on an interval, not just waiting and executing once.
Second, we have to take steps to ensure thread-safe data integrity on our intermediate data store (in your case you used DataTable
, but I am using a List
of a custom class, because DataTable
is too costly for what we want to do). You'll notice I accomplish this by locking before updates to our List
.
Updates to your data processing class:
private bool isComplete = false;
private object DataStoreLock = new object();
private List<MyCustomClass> myDataStore;
private Timer myTimer;
public void ProcessData()
{
myTimer = new Timer(SaveData, null, TimeSpan.Zero, TimeSpan.FromMinutes(5.0));
foreach (var item in File.ReadLines(path))
{
var myData = new MyCustomClass()
{
ID = 0, // Some code here
Name = "Some code here",
Age = 0 // Some code here
};
lock (DataStoreLock)
{
myDataStore.Add(myData);
}
}
isComplete = true;
}
public void SaveData(object arg)
{
// Our first step is to check if timed work is done.
if (isComplete)
{
myTimer.Dispose();
myTimer = null;
}
// Our next step is to create a local instance of the data store to work on, which
// allows ProcessData to continue populating while our DB actions are being performed.
List<MyCustomClass> lDataStore;
lock (DataStoreLock)
{
lDataStore = myDataStore;
myDataStore = new List<MyCustomClass>();
}
//Some code DB code here.
}
EDIT: I've changed the enumeration to go through ReadLines
rather than ReadAllLines
. Read Remarks under the ReadLines
method on MSDN. ReadAllLines
will be a blocking call, while ReadLines
will allow enumeration to be processed while reading the file. I can't imagine a scenario otherwise where your foreach
would be running for more than 5 minutes if the file had been read all to memory already.
Upvotes: 3
Reputation: 67
Is it essential that you read each text file in completely with ReadAllLines, this will be consuming a large amount of memory. Why not Read x lines from a file, save to database, then continue until the end of the file is reached?
Upvotes: 4