Reputation: 17
`` Hello everyone, I am working on a deep learning project. The data I will use for the project consists of multiple excel files. Since I will be using the pd.read_csv command of the Pandas library, I used a VBA code that automatically converts all excel files to csv format.
Here is the VBA CODE: (xlsx to csv)
Sub WorkbooksSaveAsCsvToFolder()
'UpdatebyExtendoffice20181031
Dim xObjWB As Workbook
Dim xObjWS As Worksheet
Dim xStrEFPath As String
Dim xStrEFFile As String
Dim xObjFD As FileDialog
Dim xObjSFD As FileDialog
Dim xStrSPath As String
Dim xStrCSVFName As String
Dim xS As String
Application.ScreenUpdating = False
Application.EnableEvents = False
Application.Calculation = xlCalculationManual
Application.DisplayAlerts = False
On Error Resume Next
Set xObjFD = Application.FileDialog(msoFileDialogFolderPicker)
xObjFD.AllowMultiSelect = False
xObjFD.Title = "Kutools for Excel - Select a folder which contains Excel files"
If xObjFD.Show <> -1 Then Exit Sub
xStrEFPath = xObjFD.SelectedItems(1) & "\"
Set xObjSFD = Application.FileDialog(msoFileDialogFolderPicker)
xObjSFD.AllowMultiSelect = False
xObjSFD.Title = "Kutools for Excel - Select a folder to locate CSV files"
If xObjSFD.Show <> -1 Then Exit Sub
xStrSPath = xObjSFD.SelectedItems(1) & "\"
xStrEFFile = Dir(xStrEFPath & "*.xlsx*")
Do While xStrEFFile <> ""
xS = xStrEFPath & xStrEFFile
Set xObjWB = Application.Workbooks.Open(xS)
xStrCSVFName = xStrSPath & Left(xStrEFFile, InStr(1, xStrEFFile, ".") - 1) & ".csv"
xObjWB.SaveAs Filename:=xStrCSVFName, FileFormat:=xlCSV
xObjWB.Close savechanges:=False
xStrEFFile = Dir
Loop
Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True
Application.ScreenUpdating = True
Application.DisplayAlerts = True
End Sub
With this code, thousands of .xlsx files become .csv. The problem here is that although the conversion happens correctly, when I use the pd.read_csv command, it only reads 1 column.
As it seems:
0
0 PlatformData,2,0.020000,43.000000,33.000000,32...
1 PlatformData,1,0.020000,42.730087,33.000000,25...
2 PlatformData,2,0.040000,43.000000,33.000000,32...
3 PlatformData,1,0.040000,42.730141,33.000006,25...
4 PlatformData,2,0.060000,43.000000,33.000000,32...
... ...
9520 PlatformData,1,119.520000,42.931132,33.056849,...
9521 PlatformData,1,119.540000,42.931184,33.056868,...
9522 PlatformData,1,119.560000,42.931184,33.056868,...
9523 PlatformData,1,119.580000,42.931237,33.056887,...
9524 PlatformData,1,119.600000,42.931237,33.056887,...
Because the column part is not correct, it combines the data and prevents me from training the model.
Afterwards, in order to understand what the problem was, I saw that the problem disappeared when I converted only 1 excel file to .csv format manually using the "Save as" command and read it using the pandas library.
Which looks like this:
0 1 2 3 4 5 6 7 8 9 10 11
0 PlatformData 2 0.02 43.000000 33.000000 3200.0 0.000000 0.0 0.0 0.000000 0.000000 -0.0
1 PlatformData 1 0.02 42.730087 33.000000 3050.0 60.000029 0.0 0.0 74.999931 129.903854 -0.0
2 PlatformData 2 0.04 43.000000 33.000000 3200.0 0.000000 -0.0 0.0 0.000000 0.000000 -0.0
3 PlatformData 1 0.04 42.730114 33.000064 3050.0 60.000029 0.0 0.0 74.999931 129.903854 -0.0
4 PlatformData 2 0.06 43.000000 33.000000 3200.0 0.000000 -0.0 0.0 0.000000 0.000000 -0.0
... ... ... ... ... ... ... ... ... ... ... ... ...
57867 PlatformData 1 119.72 42.891333 33.019166 2550.0 5.000000 0.0 0.0 149.429214 13.073360 -0.0
57868 PlatformData 1 119.74 42.891333 33.019166 2550.0 5.000000 0.0 0.0 149.429214 13.073360 -0.0
57869 PlatformData 1 119.76 42.891387 33.019172 2550.0 5.000000 0.0 0.0 149.429214 13.073360 -0.0
57870 PlatformData 1 119.78 42.891387 33.019172 2550.0 5.000000 0.0 0.0 149.429214 13.073360 -0.0
57871 PlatformData 1 119.80 42.891441 33.019178 2550.0 5.000000 0.0 0.0 149.429214 13.073360 -0.0
As seen here, each comma is separated as a separate column.
I need to convert multiple files using VBA or some other convert technique because I have so many excel files. But as you can see, even though the format of the files is translated correctly, it is read incorrectly by pandas.
I've tried converting with a bunch of different VBA codes so far. Then I tried to read it with the read_excel command on python and then convert it with to_csv, but I encountered the same problem again. (Reading only 1 column)
What do I need to do to make it look like it was when I changed the format manually? Is there an error in the VBA code or do I need to implement another method for this operation?
Thank you for your interest. Thanks in advance for any help
Upvotes: 1
Views: 339
Reputation: 144
I'm not sure how to change your OS separator like @FunThomas suggested, perhaps you could instead specify the delimiter used for read_csv()
or writing out to_csv()
Have you tried specifying a delimiter? i.e.
import pandas as pd
df = pd.read_csv('Book1.csv', sep='\t')
print(df)
See more here: https://www.geeksforgeeks.org/pandas-dataframe-to-csv-file-using-tab-separator/
Note the link above shows to_csv, but the param sep
exists for read_csv
too. See docs here.
Upvotes: 1
Reputation: 29286
Dealing with CSV is a tricky thing (not only in Excel). "CSV" stands for "comma separated values", and Excel takes this literally: When you use SaveAs FileFormat:=xlCSV
, it will put a comma between your values. Except if you are using local setting on your computer that have a different separator defined, then Excel is using that separator (on my computer, for example, a semicolon).
Your Pandas seems to expect tab characters as separator. You could try SaveAs FileFormat:=xlText
or xlTextWindows
- on my computer that generated tab separated files, but I couldn't find a documentation telling that this is always the case. The alternative is to use a small routine that writes the file manually - see for example VBA code to save Excel sheet as tab-delimited text file
However, I doubt that you cannot bring Pandas to read comma separated files. According to https://pandas.pydata.org/docs/user_guide/io.html#io-read-csv-table, you should be able to define the separation character.
Upvotes: 1