Bowen Peng
Bowen Peng

Reputation: 1825

A puzzled result after calling pandas.get_dummies()

I get data like this way:

train.MSZoning.value_counts()
Out:
RL         1151
RM          218
FV           65
RH           16
C (all)      10
Name: MSZoning, dtype: int64

And I try label encode it in this way:

C (all) => 0
Fv => 1
RH => 2
RL => 3
RM => 4

SO, I guess I print value_counts()again which will be like this way:

Out:
0           10 
1           65
2           16
3           1151
4           218

And I try to use Pandas.get_dummies() like this:

t = pd.get_dummies(train.MSZoning)
print(t)
Out:
    C (all) FV  RH  RL  RM
0   0   0   0   1   0
1   0   0   0   1   0
2   0   0   0   1   0
3   0   0   0   1   0
4   0   0   0   1   0
5   0   0   0   1   0
...

And I print pd.Dataframe(t).describe() to get the description of it.

        C (all)     FV          RH          RL          RM
count   1460.000000 1460.000000 1460.000000 1460.000000 1460.000000
mean    0.006849    0.044521    0.010959    0.788356    0.149315
std     0.082505    0.206319    0.104145    0.408614    0.356521
min     0.000000    0.000000    0.000000    0.000000    0.000000
25%     0.000000    0.000000    0.000000    1.000000    0.000000
50%     0.000000    0.000000    0.000000    1.000000    0.000000
75%     0.000000    0.000000    0.000000    1.000000    0.000000
max     1.000000    1.000000    1.000000    1.000000    1.000000

BUT when trying to use pd.get_dummies() in this way, I get something different which puzzled me:

train.MSZoning = pd.get_dummies(train.MSZoning)
Out:
print(train.MSZoning)
0       1
1       1
2       1
3       1
4       1
5       1
...

train.MSZoning.describe()
Out:
count    1460.000000
mean        0.993151
std         0.082505
min         0.000000
25%         1.000000
50%         1.000000
75%         1.000000
max         1.000000
Name: MSZoning, dtype: float64

I am wondering why it gets two different results after calling function get_dummies() and assigning it?

So if not mind, could anyone help me?

Sincerely appreciated.

Upvotes: 0

Views: 45

Answers (1)

Franco Piccolo
Franco Piccolo

Reputation: 7420

I think you should reconsider this line:

train.MSZoning = pd.get_dummies(train.MSZoning)

You are assigning a DataFrame to a Series.

Not sure what's going on there but my guess is that is not your intention.

Upvotes: 1

Related Questions