Create a column based on a condition python pandas

Question

Here is the sample data

import pandas as pd
df=pd.DataFrame({'P_Name':['ABC','ABC','ABC','ABC','PQR','PQR','PQR','PQR','XYZ','XYZ','XYZ','XYZ'],
   'Date':['11/01/2020','12/01/2020','13/01/2020','14/01/2020','11/01/2020','12/01/2020','13/01/2020','14/01/2020','11/01/2020','12/01/2020','13/01/2020','14/01/2020'],
    'Open':['242.584','238.179','233.727','229.441','241.375','28.965','235.96','233.193','280.032','78.472','277.592','276.71'],
    'End':['4.405','4.452','4.286','4.405','2.41','3.005','2.767','3.057','1.56','0.88','0.882','0.88'],
    'Close':['238.179','233.727','229.441','225.036','238.965','235.96','233.193','230.136','278.472','277.592','276.71','275.83']})

I'm trying to create a new column where the condition will be that for every new product entry, the corresponding will be 1 AND will also have to check the condition if df['Close'][0] == df['Open'][1] are same the value will be 1 if not same(E.g df['Close'][8] == df['Open'][9]) then 0 df after these conditions

   P_Name        Date     Open    End    Close   Check
0     ABC  11/01/2020  242.584  4.405  238.179   1
1     ABC  12/01/2020  238.179  4.452  233.727   1
2     ABC  13/01/2020  233.727  4.286  229.441   1
3     ABC  14/01/2020  229.441  4.405  225.036   1
4     PQR  11/01/2020  241.375   2.41  238.965   1
5     PQR  12/01/2020  28.965   3.005   235.96   0
6     PQR  13/01/2020   235.96  2.767  233.193   1
7     PQR  14/01/2020  233.193  3.057  230.136   1
8     XYZ  11/01/2020  280.032   1.56  278.472   1
9     XYZ  12/01/2020   78.472   0.88  277.592   0
10    XYZ  13/01/2020  277.592  0.882   276.71   1
11    XYZ  14/01/2020   276.71   0.88   275.83   1

jezrael · Accepted Answer

You can compare shifted values per groups by DataFrameGroupBy.shift with Series.eq with replace missing values by another column by Series.fillna with cast mask to 0,1 with Series.astype:

df['Check'] = df.Open.eq(df.groupby('P_Name').Close.shift().fillna(df.Open)).astype(int)

Anothr idea is compare without groups, but chain another mask with Series.duplicated for match first rows per groups:

df['Check'] = (~df.P_Name.duplicated() | df.Open.eq(df.Close.shift())).astype(int)

print (df)
   P_Name        Date     Open    End    Close  Check
0     ABC  11/01/2020  242.584  4.405  238.179      1
1     ABC  12/01/2020  238.179  4.452  233.727      1
2     ABC  13/01/2020  233.727  4.286  229.441      1
3     ABC  14/01/2020  229.441  4.405  225.036      1
4     PQR  11/01/2020  241.375   2.41  238.965      1
5     PQR  12/01/2020   28.965  3.005   235.96      0
6     PQR  13/01/2020   235.96  2.767  233.193      1
7     PQR  14/01/2020  233.193  3.057  230.136      1
8     XYZ  11/01/2020  280.032   1.56  278.472      1
9     XYZ  12/01/2020   78.472   0.88  277.592      0
10    XYZ  13/01/2020  277.592  0.882   276.71      1
11    XYZ  14/01/2020   276.71   0.88   275.83      1

Create a column based on a condition python pandas

Answers (2)

Related Questions