Reputation: 107
age income student credit_rating Class_buys_computer
0 youth high no fair no
1 youth high no excellent no
2 middle_aged high no fair yes
3 senior medium no fair yes
4 senior low yes fair yes
5 senior low yes excellent no
6 middle_aged low yes excellent yes
7 youth medium no fair no
8 youth low yes fair yes
9 senior medium yes fair yes
10 youth medium yes excellent yes
11 middle_aged medium no excellent yes
12 middle_aged high yes fair yes
13 senior medium no excellent no
I am using this dataset and wish to have the variables like age
, income
etc as like factor variables
in R
, How can i do it in python
Upvotes: 1
Views: 4773
Reputation: 862641
You can use astype
with parameter category
:
cols = ['age','income','student']
for col in cols:
df[col] = df[col].astype('category')
print (df.dtypes)
age category
income category
student category
credit_rating object
Class_buys_computer object
dtype: object
If need convert all columns:
for col in df.columns:
df[col] = df[col].astype('category')
print (df.dtypes)
age category
income category
student category
credit_rating category
Class_buys_computer category
dtype: object
You need loop, because if use:
df = df.astype('category')
NotImplementedError: > 1 ndim Categorical are not supported at this time
Pandas documentation about categorical.
EDIT by comment:
If need ordered catagorical, use another solution with pandas.Categorical
:
df['age']=pd.Categorical(df['age'],categories=["youth","middle_aged","senior"],ordered=True)
print (df.age)
0 youth
1 youth
2 middle_aged
3 senior
4 senior
5 senior
6 middle_aged
7 youth
8 youth
9 senior
10 youth
11 middle_aged
12 middle_aged
13 senior
Name: age, dtype: category
Categories (3, object): [youth < middle_aged < senior]
Then you can sort DataFrame by column age
:
df = df.sort_values('age')
print (df)
age income student credit_rating Class_buys_computer
0 youth high no fair no
1 youth high no excellent no
7 youth medium no fair no
8 youth low yes fair yes
10 youth medium yes excellent yes
2 middle_aged high no fair yes
6 middle_aged low yes excellent yes
11 middle_aged medium no excellent yes
12 middle_aged high yes fair yes
3 senior medium no fair yes
4 senior low yes fair yes
5 senior low yes excellent no
9 senior medium yes fair yes
13 senior medium no excellent no
Upvotes: 1