Reputation: 327
Have a webform that has an upload button to upload csv file then my code needs to parse the file and use the parsed data to insert into a SQL table. Is what I'm doing correct for the parse data to a List, it's not picking up the filename for the streamreader. Is this the most effective way to parse the data? Should I parse into to a datatable?
protected void UploadBtn_Click(object sender, EventArgs e)
{
if (FileUpload.HasFile)
{
string filename = Path.GetFileName(FileUpload.FileName);
List<string[]> ValuesToUpload = parseData(filename);
//if (!Directory.Exists(ConfigurationManager.AppSettings["temp_dir"].ToString().Trim()))
//{
// Directory.CreateDirectory(ConfigurationManager.AppSettings["temp_dir"].ToString().Trim());
//}
//FileUpload.SaveAs(ConfigurationManager.AppSettings["temp_dir"].ToString().Trim() + filename);
//using (FileStream stream = new FileStream(ConfigurationManager.AppSettings["temp_dir"].ToString().Trim() + filename, FileMode.Open, FileAccess.Read, FileShare.Read))
}
}
public List<string[]> parseData(filename)
{
int j=0;
List <string[]> members = new List<string[]>();
try
{
using (StreamReader read = new StreamReader(filename))
{
while (!read.EndOfStream)
{
string line = read.ReadLine();
string[] values = line.Split(',');
if(j==0)
{
j++;
continue;
}
long memnbr = Convert.ToInt64(values[0]);
int loannbr = Convert.ToInt32(values[1]);
int propval = Convert.ToInt32(values[2]);
members.Add(values);
}
Upvotes: 1
Views: 2927
Reputation: 20571
The below code sample will bulk insert CSV data into a staging table that has matching columns and then will execute a Stored Procedure to normalize the data on the server.
This is significantly more efficiently than manually parsing the data and inserting the data line-by-line. A few months ago I used similar code to submit 1,500,000+ records to our database and normalize the data in a matter of seconds.
var sqlConnection = new SqlConnection(DbConnectionStringInternal);
// Bulk-import our unnormalized data from the .csv file into a staging table
var inputFileConnectionString = String.Format("Driver={{Microsoft Text Driver (*.txt; *.csv)}};Extensions=csv;Readonly=True;Dbq={0}", Path.GetDirectoryName(csvFilePath));
using (var inputFileConnection = new OdbcConnection(inputFileConnectionString))
{
inputFileConnection.Open();
var selectCommandText = String.Format("SELECT * FROM {0}", Path.GetFileName(csvFilePath));
var selectCommand = new OdbcCommand(selectCommandText, inputFileConnection);
var inputDataReader = selectCommand.ExecuteReader(CommandBehavior.CloseConnection);
var sqlBulkCopy = new SqlBulkCopy(sqlConnection) { DestinationTableName = "Data_Staging" };
if (sqlConnection.State != ConnectionState.Open)
sqlConnection.Open();
sqlBulkCopy.WriteToServer(inputDataReader);
}
// Run a stored-procedure to normalize the data in the staging table, then efficiently move it across to the "real" tables.
var addDataFromStagingTable = String.Format("EXEC SP_AddDataFromStagingTable");
if (sqlConnection.State != ConnectionState.Open)
sqlConnection.Open();
using (var addToStagingTableCommand = new SqlCommand(addDataFromStagingTable, sqlConnection) { CommandTimeout = 60 * 20 })
addToStagingTableCommand.ExecuteNonQuery();
sqlConnection.Close();
Upvotes: 0
Reputation: 26424
Use KBCsv. We are getting 40K rows parsed per second, and 70K+ rows skipped per second. This is the fastest I have seen. And also pretty stable. Then generate SQL manually as suggested above. If doing data reload and aim for performance, run multi-threaded, no transaction (MS SQL only). Can get up to 10K rows per second of import speed, depending on your network bandwidth to database server.
Do not parse to DataTable
- it is very slow.
Upvotes: 1
Reputation: 120460
Not really an answer, but too long to post as a comment...
As it looks like you're throwing away your parsed values (memnbr
, etc...), you could significantly reduce your csv parsing code to:
return
File
.ReadLines(filename)
.Skip(1)
.Select(line => line.Split(','))
.ToList();
Upvotes: 0
Reputation: 238
Since you're going to insert the data into the SQL table, I'd first create a class that represents the table and create a new object for each record. (this is for visibility).
or I could use the following approaches (assuming you're using MS SQL Server) 1. The Dynamic Insert Query
StringBuilder strInsertValues = new StringBuilder("VALUES");
your ParsingCode HERE..
string [] values = line.Split(',');
strInsertValues.AppendFormat("({0},{1},{2}),", values[0], values[1], values[2]);
end parse
using(SqlConnection cn = new SqlConnection(YOUR_CONNECTION_STRING)){
SqlCommand cmd = cn.CreateCommand;
cmd.CommandType = SqlCommandType.Text;
cmd.CommandText = "INSERT INTO TABLE(Column1, Column2, Column3) " + strInsertValues.ToString().SubString(0, strInsertValues.Length);
cn.Open();
cmd.ExecuteNonQuery();
}
2. Use BulkCopy (Recommanded) Create a DataSet the represents your CSV values Add a new record for each line parsed Create Column Mappings for your DataSet and SQL Table, Use BulkCopy Object to insert your data. Ref to BulkCopy: http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx
Upvotes: 0