Reputation: 31
I have some very large .csv files that I want to query and pull entries out. I've set this up using ADO in Excel 2016 successfully, or so it seemed, and remarkably fast. I am looking to query a field ID using;
SELECT * FROM <file> WHERE ID = #.
This has worked fine up until the ID numbers which contain letters, i.e. 960545H4. These appear much further down the file, around ~400k rows in.
I believe the problem is that excel is assuming this field (under the header ID) is numeric based one the first x number of entries. I want this to be set to text.
Is there anyway to set the datatype of the field/column so that I can query successfully?
Sub testSQL()
Dim xlcon as ADOB.Connection
Dim xlrs as ADOB.RecordSet
Dim nextRow as Integer
Dim datafilepath as String
Dim datafilename as String
Set xlcon = New ADOB.Connection
Set xlrs = New ADOB.RecordSet
datafilepath = "U:\Common\"
datafilename = "test_file"
xlcon.Provider = "Microsoft.Jet.OLEDB.4.0"
xlcon.ConnectionString = "Data Source=" & datafilepath & ";" & "Extended Properties=""text;HDR=Yes;FMT=Delimited,"""
xlcon.Open
xlrs.Open "SELECT * FROM [" & datafilename & ".csv] WHERE ID = '023487562HH'", xlcon
'the rest of the code...
I was hoping something slong the lines of;
SELECT * FROM [file] WHERE CStr(ID) = 34897562FD
but this doesn't seem to work.
Any help is greatly appreciated!!
Upvotes: 1
Views: 1251
Reputation: 12167
One could take Killuminati's solution and add also the field names as needed. The parameter MaxScanRows
just makes sure that the OLEDB driver will scan the whole file as stated also in the documentation
The data types of the fields can also be determined. Use the MaxScanRows option to indicate how many rows should be scanned when determining the column types. If you set MaxScanRows to 0, the whole file is scanned. The MaxScanRows setting in Schema.ini overrides the setting in the Windows Registry, file by file.
In order to declare the data type for each field one could use a schema.ini like that
[test_file.csv]
Format=Delimited(;)
DecimalSymbol=,
ColNameHeader=False
Col1=Field1 Text
Col2=Field2 Text
Col3=Field3 Text
Col4=StartDate Date
Col5=EndDate Date
Col6=Price Currency
The entry after the equal sign is name of the field resp. column. If the file contains a header it can be the same but does not have to but it is required.
Code could look like that
Sub ADO()
Dim rs As New ADODB.Recordset
Dim conn As New ADODB.Connection
Dim myPath As String
myPath = ThisWorkbook.Path & "\TextFiles\"
conn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & myPath & ";Extended Properties=""text;HDR=No;FMT=Delimited()"";"
With rs
.ActiveConnection = conn
.CursorType = adOpenKeyset
.LockType = adLockOptimistic
.Open "SELECT * FROM [test_file.csv]"
.AddNew
.Fields("Field1") = "Doe"
.Fields("Field2") = "John"
.Fields("Field3") = "123456"
.Fields("StartDate") = Date
.Fields("EndDate") = "05.10.2018"
.Fields("Price") = 1234.56
.Update
End With
conn.Close
End Sub
Upvotes: 2
Reputation: 31
Thanks to Storax's recommendation on using a schema file I have got this working.
I created a schema file in the same location as the file. The file looked like;
[test_file.csv]
Format=CSVDelimited
ColNameHeader=True
MaxScanRows=0
I can now return the IDs I am looking for! Many thanks to Storax for this solution!
Upvotes: 2