LamaMo
LamaMo

Reputation: 626

Add column based on different conditions for different columns | python pandas

I have a dataframe with 4 columns:

c1        c2        c3      GName
0.221445  0.300534  5.689   KDD
0.001000  0.969000  15.140  ACC
1.000000  0.094000  -0.245  QETF

And dataframe called file of one column:

GName
Abd
kkoew
KDD
pwqh
ACC
dsewf

I need to add new column call label that based on checking the scores in c1, c2 and c3 and GName

So, if the majority of the 3 scores agreed on their conditions (2 out of the 3 or all the 3) and the value of GName exist in the dataframe file; the label = 1, otherwise the label = 0

The conditions of c1 should be > 0.95
c2 should be > 0.50
c3 should be > 15

The output will be like this:

c1        c2        c3      GName label
0.221445  0.300534  5.689   KDD   0  (because 0 out of 3 and KDD in file)
0.001000  0.969000  15.140  ACC   1  (because 2 out of 3 and ACC in file)
1.000000  0.94060  -0.245  QETF   0  (because 2 out of 3 but QETF not in file)

I'm struggling with those different conditions, any help please?

Upvotes: 0

Views: 36

Answers (1)

CJR
CJR

Reputation: 3985

The way I would do it is this:

import pandas as pd

df = pd.DataFrame({'c1':[0.221445, 0.001000, 1.000000],
                   'c2':[0.300534, 0.969000, 0.094000],
                   'c3':[5.689, 15.140, -0.245],
                   'GName':['KDD', 'ACC', 'QETF']})
file = pd.DataFrame({'GName':['KDD', 'ACC']})

conditions = (df['c1'] > 0.95).astype(int) + (df['c2'] > 0.5).astype(int) + (df['c3'] > 15).astype(int)
conditions = (conditions >= 2) & (df['GName'].isin(file['GName']))
df['label'] = 0
df.loc[conditions, 'label'] = 1

>>> df
         c1        c2      c3 GName  label
0  0.221445  0.300534   5.689   KDD      0
1  0.001000  0.969000  15.140   ACC      1
2  1.000000  0.094000  -0.245  QETF      0

It would be nice if you could include code to generate your dataframe in your question, as well.

Upvotes: 1

Related Questions