SophieLD
SophieLD

Reputation: 29

convert csv to hdf5 by using vaex.from_csv Error: 'DataFrameArrays' object has no attribute 'dtype'

I have a csv file with more than 13 million rows, I want to convert to hdf5: I can run code:

df_chunk = vx.from_csv(r'df.csv', nrows=20_000_000)

but if I run following code:

df_chunk.export(r'df.hdf5')

I got error:

AttributeError: 'DataFrameArrays' object has no attribute 'dtype'

same error happens when I run:

df_chunk = vx.from_csv(r'df.csv', convert='True', nrows=20_000_000)

Can you tell me what's wrong or how I can solve this. Thanks

Upvotes: 0

Views: 1109

Answers (2)

SophieLD
SophieLD

Reputation: 29

I tried to degrade python version to 3.7, re-install new version of Vaex(4.0), then run the code, all work without error. Thank you for all the attention and help I have gotten.

Upvotes: 2

kcw78
kcw78

Reputation: 7996

The error message (object has no attribute 'dtype') is interesting. dtype is a NumPy thing (it describes the data types of a NumPy array). Maybe that's a clue.

I am not familiar with vaex, so I read their documentation. :-)

I noticed you didn't use the seperator parameter (note spelling is from the docs). If your values really are comma separated, you need seperator=",".

If that doesn't work, this might help. The vaex 4.0.0-dev0 documentation shows other ways to read a CSV file and create a HDF5 file. Have you tried vx.from_ascii()? The docs show this method:

ds = vx.from_ascii("table.csv", seperator=",", names=["x", "y", "z"])

Adding the names= parameter might help with the dtype message (if compound arrays are being used). Using that example, this might work (you will have to create the names in the list:

df_chunk = vx.from_ascii('df.csv', seperator=",", names=[--add your column names here--], nrows=20_000_000)  
df_chunk.export('df.hdf5')

Note: I removed the r from the filename strings ('df.csv' instead of r'df.csv'). Not sure if that matters for this case.

Upvotes: 0

Related Questions