Reputation: 31

ADODB imported .csv change data type

I have some very large .csv files that I want to query and pull entries out. I've set this up using ADO in Excel 2016 successfully, or so it seemed, and remarkably fast. I am looking to query a field ID using;

SELECT * FROM <file> WHERE ID = #.

This has worked fine up until the ID numbers which contain letters, i.e. 960545H4. These appear much further down the file, around ~400k rows in.

I believe the problem is that excel is assuming this field (under the header ID) is numeric based one the first x number of entries. I want this to be set to text.

Is there anyway to set the datatype of the field/column so that I can query successfully?

Sub testSQL()

Dim xlcon as ADOB.Connection
Dim xlrs as ADOB.RecordSet
Dim nextRow as Integer
Dim datafilepath as String
Dim datafilename as String

Set xlcon = New ADOB.Connection
Set xlrs = New ADOB.RecordSet
datafilepath = "U:\Common\"
datafilename = "test_file"

xlcon.Provider = "Microsoft.Jet.OLEDB.4.0"
xlcon.ConnectionString = "Data Source=" & datafilepath & ";" & "Extended Properties=""text;HDR=Yes;FMT=Delimited,"""

xlcon.Open

xlrs.Open "SELECT * FROM [" & datafilename & ".csv] WHERE ID = '023487562HH'", xlcon

'the rest of the code...

I was hoping something slong the lines of;

SELECT * FROM [file] WHERE CStr(ID) = 34897562FD

but this doesn't seem to work.

Any help is greatly appreciated!!

Upvotes: 1

Answers (2)

Storax

Reputation: 12167

One could take Killuminati's solution and add also the field names as needed. The parameter MaxScanRows just makes sure that the OLEDB driver will scan the whole file as stated also in the documentation

The data types of the fields can also be determined. Use the MaxScanRows option to indicate how many rows should be scanned when determining the column types. If you set MaxScanRows to 0, the whole file is scanned. The MaxScanRows setting in Schema.ini overrides the setting in the Windows Registry, file by file.

In order to declare the data type for each field one could use a schema.ini like that

[test_file.csv]
Format=Delimited(;)
DecimalSymbol=,
ColNameHeader=False
Col1=Field1 Text 
Col2=Field2 Text 
Col3=Field3 Text 
Col4=StartDate Date
Col5=EndDate Date
Col6=Price Currency

The entry after the equal sign is name of the field resp. column. If the file contains a header it can be the same but does not have to but it is required.

Code could look like that

Sub ADO()
Dim rs As New ADODB.Recordset
Dim conn As New ADODB.Connection
Dim myPath As String
    myPath = ThisWorkbook.Path & "\TextFiles\"
    conn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & myPath & ";Extended Properties=""text;HDR=No;FMT=Delimited()"";"
    With rs
        .ActiveConnection = conn
        .CursorType = adOpenKeyset
        .LockType = adLockOptimistic
        .Open "SELECT * FROM [test_file.csv]"
        .AddNew
        .Fields("Field1") = "Doe"
        .Fields("Field2") = "John"
        .Fields("Field3") = "123456"
        .Fields("StartDate") = Date
        .Fields("EndDate") = "05.10.2018"
        .Fields("Price") = 1234.56
        .Update
    End With
    conn.Close
End Sub

Upvotes: 2

Killuminati

Reputation: 31

Thanks to Storax's recommendation on using a schema file I have got this working.

I created a schema file in the same location as the file. The file looked like;

[test_file.csv]
Format=CSVDelimited
ColNameHeader=True
MaxScanRows=0

I can now return the IDs I am looking for! Many thanks to Storax for this solution!

Upvotes: 2

ADODB imported .csv change data type

Answers (2)

Related Questions