Reputation: 1825
I get data like this way:
train.MSZoning.value_counts()
Out:
RL 1151
RM 218
FV 65
RH 16
C (all) 10
Name: MSZoning, dtype: int64
And I try label encode it in this way:
C (all) => 0
Fv => 1
RH => 2
RL => 3
RM => 4
SO, I guess I print value_counts()
again which will be like this way:
Out:
0 10
1 65
2 16
3 1151
4 218
And I try to use Pandas.get_dummies()
like this:
t = pd.get_dummies(train.MSZoning)
print(t)
Out:
C (all) FV RH RL RM
0 0 0 0 1 0
1 0 0 0 1 0
2 0 0 0 1 0
3 0 0 0 1 0
4 0 0 0 1 0
5 0 0 0 1 0
...
And I print pd.Dataframe(t).describe()
to get the description of it.
C (all) FV RH RL RM
count 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000
mean 0.006849 0.044521 0.010959 0.788356 0.149315
std 0.082505 0.206319 0.104145 0.408614 0.356521
min 0.000000 0.000000 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000 1.000000 0.000000
50% 0.000000 0.000000 0.000000 1.000000 0.000000
75% 0.000000 0.000000 0.000000 1.000000 0.000000
max 1.000000 1.000000 1.000000 1.000000 1.000000
BUT when trying to use pd.get_dummies()
in this way, I get something different which puzzled me:
train.MSZoning = pd.get_dummies(train.MSZoning)
Out:
print(train.MSZoning)
0 1
1 1
2 1
3 1
4 1
5 1
...
train.MSZoning.describe()
Out:
count 1460.000000
mean 0.993151
std 0.082505
min 0.000000
25% 1.000000
50% 1.000000
75% 1.000000
max 1.000000
Name: MSZoning, dtype: float64
I am wondering why it gets two different results after calling function get_dummies()
and assigning it?
So if not mind, could anyone help me?
Sincerely appreciated.
Upvotes: 0
Views: 45
Reputation: 7420
I think you should reconsider this line:
train.MSZoning = pd.get_dummies(train.MSZoning)
You are assigning a DataFrame
to a Series
.
Not sure what's going on there but my guess is that is not your intention.
Upvotes: 1