Reputation: 262
I am trying to read an Excel 2013 file (.xlsx which is about 100 MB in size) using Visual Basic.NET with the help of OleDB. The main concern was getting a system out of memory exception in the line:
da.Fill(dt)
from the code below.
Private Function ReadExcelFile() As DataSet
Dim ds As New DataSet()
Dim connectionString As String =
"Provider=Microsoft.ACE.OLEDB.12.0;;Extended Properties=Excel 12.0 XML;Data Source=C:\file.xlsx;"
Using connection As New OleDbConnection(connectionString)
connection.Open()
Dim cmd As New OleDbCommand()
cmd.Connection = connection
Dim dtSheet As DataTable = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, Nothing)
For Each dr As DataRow In dtSheet.Rows
Dim sheetName As String = dr("TABLE_NAME").ToString()
If Not sheetName.EndsWith("$") Then
Continue For
End If
cmd.CommandText = "SELECT * FROM [" & sheetName & "];"
Dim dt As New DataTable()
dt.TableName = sheetName
Dim da As New OleDbDataAdapter(cmd)
da.Fill(dt)
ds.Tables.Add(dt)
Next
cmd = Nothing
connection.Close()
End Using
Return ds
End Function
but I think the best solution is to read the data by chunks so I found out that I could read the data by adding a column range in the SQL statement like this:
cmd.CommandText = "SELECT * FROM [" & sheetName & "B1:B10];"
I did a loop by doing increments on that range but I found an error. With this for example,
cmd.CommandText = "SELECT * FROM [" & sheetName & "B50000:B51000];"
It still works. However, if I do,
cmd.CommandText = "SELECT * FROM [" & sheetName & "B70000:B70001];"
I get this error.
Note that the Excel file has 475128 rows and B70000-B70001 isn't even half the total.
Could anyone shed some light? I think I'm missing something here.
Upvotes: 2
Views: 1293
Reputation: 262
I found a working solution. Instead of using DataSet, use DataReader. I can just add a worker so it won't hang up.
Private Function ReadExcelFile() As DataSet
Dim ds As New DataSet()
Dim connectionString As String = GetConnectionString()
Using connection As New OleDbConnection(connectionString)
connection.Open()
Dim cmd As New OleDbCommand()
cmd.Connection = connection
Dim dtSheet As DataTable = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, Nothing)
For Each dr As DataRow In dtSheet.Rows
Dim sheetName As String = dr("TABLE_NAME").ToString()
If Not sheetName.EndsWith("$") Then
Continue For
End If
cmd.CommandText = "SELECT * FROM [" & sheetName & "];"
Dim ddr As OleDbDataReader = cmd.ExecuteReader()
Dim counter As Integer = 0
While (ddr.Read())
MessageBox.Show(ddr.GetValue(0))
End While
Next
cmd = Nothing
connection.Close()
End Using
Return ds
End Function
The lines:
Dim ddr As OleDbDataReader = cmd.ExecuteReader()
Dim counter As Integer = 0
While (ddr.Read())
MessageBox.Show(ddr.GetValue(0))
End While
are the essential code where you can access the rows of the first column (index 0). This works because I read that DataSet is an in-memory object (Which is the reason why we possible can get a system out of memory exception) - Check here for reference
I'd still want to hear about why my above problem is popping up.
Upvotes: 1