Reputation:
I have a df as:
df:
col1 col2 col3 col4 col5
0 1.36 4.31 7.66 2 2
1 2.62 3.30 2.48 2 1
2 5.19 3.58 1.62 0 2
3 2.06 3.16 3.50 1 1
4 2.19 2.98 3.38 1 1
I want
col6 to return 1 when (col4 > 1 and col5 > 1) else 0
and
col7 to return 1 when (col4 > 1 and col5 > 1 and col 4 + col5 > 2) else 0
I am trying
df.loc[df['col4'] > 0, df['col5'] > 0, 'col6'] = '1'
however I am getting the error:
File "pandas\_libs\index.pyx", line 269, in pandas._libs.index.IndexEngine.get_indexer
File "pandas\_libs\hashtable_class_helper.pxi", line 5247, in pandas._libs.hashtable.PyObjectHashTable.lookup
TypeError: unhashable type: 'Series'
How can I perform this operation?
Upvotes: 4
Views: 395
Reputation: 28729
Create a variable to hold the columns :
columns = ['col4', 'col5']
Condition 1 :
cond1 = df.filter(columns).gt(1).all(1)
Condition 2:
cond2 = df.filter(columns).sum(1).gt(2)
Create new columns via assign:
df.assign(col6 = cond1.astype(int),
col7 = (cond1 & cond2).astype(int)
)
col1 col2 col3 col4 col5 col6 col7
0 1.36 4.31 7.66 2 2 1 1
1 2.62 3.30 2.48 2 1 0 0
2 5.19 3.58 1.62 0 2 0 0
3 2.06 3.16 3.50 1 1 0 0
4 2.19 2.98 3.38 1 1 0 0
Upvotes: 1
Reputation: 323386
We can do something like eval
df['col6'] = df.eval('col4 > 1 and col5 > 1').astype(int)
df['col7'] = df.eval('col4 > 1 and col5 > 1 and col4 + col5 > 2').astype(int)
Upvotes: 2
Reputation: 114558
When you do a bitwise operation on Series objects or arrays, you get an array of booleans, each of whose elements is True or False. Those are basically 0 or 1, and in fact more convenient in most cases:
df['col6'] = (df['col4'] > 1) & (df['col5'] > 1)
df['col7'] = df['col6']
That last one is not a clever trick. If two numbers are both >1, then of course their sum must be >2. If you absolutely want the integers 0 and 1 instead of booleans, use Series.astype
:
df['col6'] = ((df['col4'] > 1) & (df['col5'] > 1)).astype(int)
Upvotes: 2
Reputation: 23166
You can simple use bitwise operators:
df['col6'] = ((df["col4"]>1) & (df["col5"]>1))*1
df['col7'] = ((df["col4"]>1) & (df["col5"]>1) & (df['col4']+df['col5']>2))*1
>>> df
col1 col2 col3 col4 col5 col6 col7
0 1.36 4.31 7.66 2 2 1 1
1 2.62 3.30 2.48 2 1 0 0
2 5.19 3.58 1.62 0 2 0 0
3 2.06 3.16 3.50 1 1 0 0
4 2.19 2.98 3.38 1 1 0 0
Upvotes: 1
Reputation: 24322
try:
c1=df['col4'].gt(1) & df['col5'].gt(1)
#your 1st condition
c2=c1 & df['col4'].add(df['col5']).gt(2)
#your 2nd condition
Finally:
df['col6']=c1.astype(int)
df['col7']=c2.astype(int)
OR
via numpy's where()
method:
c1=df['col4'].gt(1) & df['col5'].gt(1)
c2=c1 & df['col4'].add(df['col5']).gt(2)
df['col6']=np.where(c1,1,0)
df['col7']=np.where(c2,1,0)
Upvotes: 1