Reputation: 157
I have a data frame column ['Cause']
with dtype as object
having the below values :
Cause
Water
Fire
Earthquake
Flood
Now when I am using get_dummies() function on this column, I got 4 additional column as below with binary values:
Water | Fire | Earthquake | Flood
My query is , all these additional 4 columns has a data type as uint8
.Do I need to convert it into int64
.
Upvotes: 1
Views: 5480
Reputation: 21
You don't need to convert again. While converting it to get_dummies()
you can define dtype
:
pd.get_dummies(['Column_name'], dtype=np.int64)
Upvotes: 2
Reputation: 1255
uint8
is the default data type that pandas forms the 'dummified columns' with.
You can always change it to different dtype.
But remember, that dtype will be assigned to all the dummified columns. For example:
pd.get_dummies(df, columns=['col1'], dtype='str')
would create dummified columns, all with the datatype str.
Upvotes: 1
Reputation: 9185
Yes, by default if you don't mention dtype it will be converted to uint8.
You can do something like this
pd.get_dummies(..., dtype=int64)
Upvotes: 5
Reputation: 71600
Well, it depends on you, it still behaves like a integer...
So you can use it like any other integer, but also you should know there is a str.get_dummies
which is default already int64
:
>>> df['Cause'].str.get_dummies()
Earthquake Fire Flood Water
0 0 0 0 1
1 0 1 0 0
2 1 0 0 0
3 0 0 1 0
>>> df['Cause'].str.get_dummies().dtypes
Earthquake int64
Fire int64
Flood int64
Water int64
dtype: object
Upvotes: 1