gromit1
gromit1

Reputation: 587

Quickest Way to Add New Rows to Datatable That Could Contain Duplicates

I have a table full of stock price data. Each row has a unique combination of Ticker symbols and dates. I load new data all the time by obtaining CSV files containing stock price data for everyday for every ticker. I know that there are duplicates in the CSV files. I only want to add that data that is not already in my data table. What is the quickest way to do this?

Should I try to add every row and catch each exception? Or, should I compare each row against my data table by reading my data table to see that line already exists? Or, is there another alternative?

Additional Info

This is what I have been doing. For each line in the CSV file I read my data table to see if it already exists.

Dim strURL As String
    Dim strBuffer As String
    strURL = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
    strBuffer = RequestWebData(strURL)
    Dim sReader As New StringReader(strBuffer)
    Dim List As New List(Of String)
    Do While sReader.Peek >= 0
        List.Add(sReader.ReadLine)
    Loop
    List.RemoveAt(0)
    Dim lines As String() = List.ToArray
    sReader.Close()
    For Each line In lines
        Dim checkDate = line.Split(",")(0).Trim()
        Dim dr As OleDbDataReader
        Dim cmd2 As New OleDb.OleDbCommand("SELECT * FROM " & tblName & " WHERE Ticker = ? AND [Date] = ?", con)
        cmd2.Parameters.AddWithValue("?", tickerValue)
        cmd2.Parameters.AddWithValue("?", checkDate)
        dr = cmd2.ExecuteReader
        If dr.Read() = 0 Then
            Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & " (Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", con)
            cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
            cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = checkDate
            cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = line.Split(",")(1).Trim
            cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = line.Split(",")(2).Trim
            cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = line.Split(",")(3).Trim
            cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = line.Split(",")(4).Trim
            cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = line.Split(",")(5).Trim
            cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = line.Split(",")(6).Trim
            cmd3.ExecuteNonQuery()
        Else
        End If

This is what I have switched to and it gives this exception: The changes you requested to the table were not successful because they would create duplicate values in the index, primary key, or relationship. Change the data in the field or fields that contain duplicate data, remove the index, or redefine the index to permit duplicate entries and try again. I could catch this exception every time and ignore it until I hit a line that is new.

Dim strURL As String = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
    Debug.WriteLine(strURL)
    Dim strBuffer As String = RequestWebData(strURL)
    Using streamReader = New StringReader(strBuffer)
        Using reader = New CsvReader(streamReader)
            reader.ReadHeaderRecord()
            While reader.HasMoreRecords
                Dim dataRecord As DataRecord = reader.ReadDataRecord()
                Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & " (Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", con)
                cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
                cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
                cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = dataRecord.Item("Open")
                cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = dataRecord.Item("High")
                cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = dataRecord.Item("Low")
                cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = dataRecord.Item("Close")
                cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = dataRecord.Item("Volume")
                cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = dataRecord.Item("Adj Close")
                cmd3.ExecuteNonQuery()
            End While
        End Using
    End Using

I just want to use the most efficient method.

Update

Per the answers below, this is the code I have so far:

 Dim strURL As String = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
    Dim strBuffer As String = RequestWebData(strURL)
    Using streamReader = New StringReader(strBuffer)
        Using reader = New CsvReader(streamReader)
            ' the CSV file has a header record, so we read that first
            reader.ReadHeaderRecord()

            While reader.HasMoreRecords
                Dim dataRecord As DataRecord = reader.ReadDataRecord()
                Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & "(Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) " & "SELECT ?, ?, ?, ?, ?, ?, ?, ? " & "FROM DUAL " & "WHERE NOT EXISTS (SELECT 1 FROM " & tblName & " WHERE Ticker = ? AND [Date] = ?)", con)
                cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
                cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
                cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = dataRecord.Item("Open")
                cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = dataRecord.Item("High")
                cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = dataRecord.Item("Low")
                cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = dataRecord.Item("Close")
                cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = dataRecord.Item("Volume")
                cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = dataRecord.Item("Adj Close")
                cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
                cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
                cmd3.ExecuteNonQuery()
            End While
        End Using
    End Using

It gives me this error Data type mismatch in criteria expression.

Upvotes: 1

Views: 241

Answers (1)

Fabian
Fabian

Reputation: 2982

Most DBMS support a (non-standard) clause for the INSERT command to ignore duplicates, e.g.:

MySQL: INSERT IGNORE INTO ...

SQLite: INSERT OR IGNORE INTO INTO ...

This is the quickest way in non-batch mode, as you don't have to read the database before you write.

You can do the same with standard SQL using:

INSERT INTO ... 
SELECT <your values> 
WHERE NOT EXISTS ( <query for your values by id> );

Or (when you explicitly need a FROM clause):

INSERT INTO ... 
SELECT <your values> 
FROM DUAL 
WHERE NOT EXISTS ( <query for your values by id> );

EDIT

MS Access does not have a built-in DUAL table (i.e., a table that always contains just one single row), but Access requires a FROM clause. So you have to build your own DUAL table:

CREATE TABLE DUAL (DUMMY INTEGER);
INSERT INTO DUAL VALUES (1);

You just do this once and for all. Then, in your code you would do inserts like

INSERT INTO MyTable (A,B,C,D)
SELECT 123, 456, 'Hello', 'World'
FROM DUAL
WHERE NOT EXISTS (SELECT 1 FROM MyTable WHERE A = 123 AND B = 456);

Thus, for your example, use:

Dim cmd3 As OleDbCommand = New OleDbCommand(_ 
    "INSERT INTO " & tblName &  _ 
    "(Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) " & _ 
    "SELECT ?, ?, ?, ?, ?, ?, ?, ? " & _ 
    "FROM DUAL " & _
    "WHERE NOT EXISTS (SELECT 1 FROM tblName WHERE Ticker = ? AND [Date] = ? AND ...)", con)

(WHERE clause depending on your key columns)

Upvotes: 1

Related Questions