Reputation: 29
I have a csv file with more than 13 million rows, I want to convert to hdf5: I can run code:
df_chunk = vx.from_csv(r'df.csv', nrows=20_000_000)
but if I run following code:
df_chunk.export(r'df.hdf5')
I got error:
AttributeError: 'DataFrameArrays' object has no attribute 'dtype'
same error happens when I run:
df_chunk = vx.from_csv(r'df.csv', convert='True', nrows=20_000_000)
Can you tell me what's wrong or how I can solve this. Thanks
Upvotes: 0
Views: 1109
Reputation: 29
I tried to degrade python version to 3.7, re-install new version of Vaex(4.0), then run the code, all work without error. Thank you for all the attention and help I have gotten.
Upvotes: 2
Reputation: 7996
The error message (object has no attribute 'dtype'
) is interesting. dtype is a NumPy thing (it describes the data types of a NumPy array). Maybe that's a clue.
I am not familiar with vaex, so I read their documentation. :-)
I noticed you didn't use the seperator
parameter (note spelling is from the docs). If your values really are comma separated, you need seperator=","
.
If that doesn't work, this might help. The vaex 4.0.0-dev0 documentation shows other ways to read a CSV file and create a HDF5 file. Have you tried vx.from_ascii()
? The docs show this method:
ds = vx.from_ascii("table.csv", seperator=",", names=["x", "y", "z"])
Adding the names=
parameter might help with the dtype message (if compound arrays are being used). Using that example, this might work (you will have to create the names in the list:
df_chunk = vx.from_ascii('df.csv', seperator=",", names=[--add your column names here--], nrows=20_000_000)
df_chunk.export('df.hdf5')
Note: I removed the r from the filename strings ('df.csv' instead of r'df.csv'). Not sure if that matters for this case.
Upvotes: 0