Reputation: 587
I have a table full of stock price data. Each row has a unique combination of Ticker symbols and dates. I load new data all the time by obtaining CSV files containing stock price data for everyday for every ticker. I know that there are duplicates in the CSV files. I only want to add that data that is not already in my data table. What is the quickest way to do this?
Should I try to add every row and catch each exception? Or, should I compare each row against my data table by reading my data table to see that line already exists? Or, is there another alternative?
Additional Info
This is what I have been doing. For each line in the CSV file I read my data table to see if it already exists.
Dim strURL As String
Dim strBuffer As String
strURL = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
strBuffer = RequestWebData(strURL)
Dim sReader As New StringReader(strBuffer)
Dim List As New List(Of String)
Do While sReader.Peek >= 0
List.Add(sReader.ReadLine)
Loop
List.RemoveAt(0)
Dim lines As String() = List.ToArray
sReader.Close()
For Each line In lines
Dim checkDate = line.Split(",")(0).Trim()
Dim dr As OleDbDataReader
Dim cmd2 As New OleDb.OleDbCommand("SELECT * FROM " & tblName & " WHERE Ticker = ? AND [Date] = ?", con)
cmd2.Parameters.AddWithValue("?", tickerValue)
cmd2.Parameters.AddWithValue("?", checkDate)
dr = cmd2.ExecuteReader
If dr.Read() = 0 Then
Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & " (Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", con)
cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = checkDate
cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = line.Split(",")(1).Trim
cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = line.Split(",")(2).Trim
cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = line.Split(",")(3).Trim
cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = line.Split(",")(4).Trim
cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = line.Split(",")(5).Trim
cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = line.Split(",")(6).Trim
cmd3.ExecuteNonQuery()
Else
End If
This is what I have switched to and it gives this exception: The changes you requested to the table were not successful because they would create duplicate values in the index, primary key, or relationship. Change the data in the field or fields that contain duplicate data, remove the index, or redefine the index to permit duplicate entries and try again.
I could catch this exception every time and ignore it until I hit a line that is new.
Dim strURL As String = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
Debug.WriteLine(strURL)
Dim strBuffer As String = RequestWebData(strURL)
Using streamReader = New StringReader(strBuffer)
Using reader = New CsvReader(streamReader)
reader.ReadHeaderRecord()
While reader.HasMoreRecords
Dim dataRecord As DataRecord = reader.ReadDataRecord()
Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & " (Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", con)
cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = dataRecord.Item("Open")
cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = dataRecord.Item("High")
cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = dataRecord.Item("Low")
cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = dataRecord.Item("Close")
cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = dataRecord.Item("Volume")
cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = dataRecord.Item("Adj Close")
cmd3.ExecuteNonQuery()
End While
End Using
End Using
I just want to use the most efficient method.
Update
Per the answers below, this is the code I have so far:
Dim strURL As String = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
Dim strBuffer As String = RequestWebData(strURL)
Using streamReader = New StringReader(strBuffer)
Using reader = New CsvReader(streamReader)
' the CSV file has a header record, so we read that first
reader.ReadHeaderRecord()
While reader.HasMoreRecords
Dim dataRecord As DataRecord = reader.ReadDataRecord()
Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & "(Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) " & "SELECT ?, ?, ?, ?, ?, ?, ?, ? " & "FROM DUAL " & "WHERE NOT EXISTS (SELECT 1 FROM " & tblName & " WHERE Ticker = ? AND [Date] = ?)", con)
cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = dataRecord.Item("Open")
cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = dataRecord.Item("High")
cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = dataRecord.Item("Low")
cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = dataRecord.Item("Close")
cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = dataRecord.Item("Volume")
cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = dataRecord.Item("Adj Close")
cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
cmd3.ExecuteNonQuery()
End While
End Using
End Using
It gives me this error Data type mismatch in criteria expression.
Upvotes: 1
Views: 241
Reputation: 2982
Most DBMS support a (non-standard) clause for the INSERT command to ignore duplicates, e.g.:
MySQL: INSERT IGNORE INTO ...
SQLite: INSERT OR IGNORE INTO INTO ...
This is the quickest way in non-batch mode, as you don't have to read the database before you write.
You can do the same with standard SQL using:
INSERT INTO ...
SELECT <your values>
WHERE NOT EXISTS ( <query for your values by id> );
Or (when you explicitly need a FROM clause):
INSERT INTO ...
SELECT <your values>
FROM DUAL
WHERE NOT EXISTS ( <query for your values by id> );
EDIT
MS Access does not have a built-in DUAL table (i.e., a table that always contains just one single row), but Access requires a FROM clause. So you have to build your own DUAL table:
CREATE TABLE DUAL (DUMMY INTEGER);
INSERT INTO DUAL VALUES (1);
You just do this once and for all. Then, in your code you would do inserts like
INSERT INTO MyTable (A,B,C,D)
SELECT 123, 456, 'Hello', 'World'
FROM DUAL
WHERE NOT EXISTS (SELECT 1 FROM MyTable WHERE A = 123 AND B = 456);
Thus, for your example, use:
Dim cmd3 As OleDbCommand = New OleDbCommand(_
"INSERT INTO " & tblName & _
"(Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) " & _
"SELECT ?, ?, ?, ?, ?, ?, ?, ? " & _
"FROM DUAL " & _
"WHERE NOT EXISTS (SELECT 1 FROM tblName WHERE Ticker = ? AND [Date] = ? AND ...)", con)
(WHERE clause depending on your key columns)
Upvotes: 1