Reputation: 41
I have data from csv and load it with read_csv in Pandas. I try to convert 6 column to float32 and its worked, but category column not converted..
I have checked my 'div' column and there is no problem with it:
df_concat['div'].unique()
array(['L', 'J', 'K', 'U', 'E', 'B', 'A', 'C', 'N', 'X', 'M', 'O', 'D',
'I', 'P', 'Q', 'S', 'R', 'T'], dtype=object)
I tried to limit data with nrows=4000000 and it success converted to category dtypes ! what's wrong with it?
this my code:
names = ['bdate', 'nama_site', 'kode_store', 'div', 'merdivdesc', 'cat', 'catdesc', 'subcat', 'subcatdesc', 'brand', 'sku', 'sku_desc', 'tillcode', 'netsales', 'profit', 'margin', 'qty']
dtype = {
'netsales' : 'float32', 'profit' : 'float32', 'margin' : 'float32', 'qty' : 'float32',
'div' : 'category'
}
data = pd.read_csv('clean_jan20_minified.csv', sep='|', dtype=dtype, chunksize=20000, names=names, skiprows=[0], nrows=4000000)
chunk_list = []
for chunk in data:
chunk_list.append(chunk)
df_concat = pd.concat(chunk_list, ignore_index=True)
when i try manually convert with df_concat['div']=df_concat['div'].astype('category')
it works. but i need convert it when read_csv
Upvotes: 2
Views: 418
Reputation: 16673
When using pd.concat
, it looks like you lost your category data type.
See this article just above General guidelines at the end of the article: https://pbpython.com/pandas_dtypes_cat.html
"In this case, the data is still there but the type has been converted to an object. Once again, this is pandas attempt to combine the data without throwing errors but not making assumptions. If you want to convert to a category data type now, you can use astype('category') ."
Also, you might want to try .reorder_categories
per this post: pandas - concat with columns of same categories turns to object
Without Sample data, I cannot help you troubleshoot.
Upvotes: 1