daiyue
daiyue

Reputation: 7458

pandas create boolean column using groupby transform

I am trying to create a boolean column using GroupBy.transform on a df like this,

id    type
1     1.00000
1     1.00000
2     2.00000
2     3.00000
3     2.00000

the code is like,

df['has_two'] = df.groupby('id')['type'].transform(lambda x: x == 2)

but instead of boolean values, has_two has float values, e.g. 0.0. I am wondering why is that.

UPDATE

I created a test case,

df = pd.DataFrame({'id':['1', '1', '2', '2', '3'], 'type':[1.0, 1.0, 2.0, 1.0, 2.0]})
df['has_2'] = df.groupby('id')['type'].transform(lambda x: x == 2)

this gave me,

   id  type  has_2
0  1   1.0    0.0
1  1   1.0    0.0
2  2   2.0    1.0
3  2   1.0    0.0
4  3   2.0    1.0

if I am using df['has_2'] = df['type'] == 2 as suggested by jezrael, it is fine,

   id  type  has_2
0  1   1.0  False
1  1   1.0  False
2  2   2.0   True
3  2   1.0  False
4  3   2.0   True

I am using pandas==0.20.3 on Python 3.5.2. I am wondering what's going on, do I need an update on pandas or python 3?

UPDATE

Updated pandas to 0.22.0 fixed this issue.

Upvotes: 3

Views: 1549

Answers (2)

Patel
Patel

Reputation: 200

Use this line

df['has_two'] = df.groupby('id')['type'].transform(lambda x: x == 2) == 2

Worked for me :)

Upvotes: 0

jezrael
jezrael

Reputation: 863291

For me it working nice, I get boolean column:

df['has_two'] = df.groupby('id')['type'].transform(lambda x: x == 2)
print (df)
   id  type  has_two
0   1   1.0    False
1   1   1.0    False
2   2   2.0     True
3   2   3.0    False
4   3   2.0     True

But maybe is possible only compare column:

df['has_two'] = df['type'] == 2
print (df)
   id  type  has_two
0   1   1.0    False
1   1   1.0    False
2   2   2.0     True
3   2   3.0    False
4   3   2.0     True

Upvotes: 2

Related Questions