Reputation: 322
I want to compare values within all rows of a dataframe to a specific column within the same row. I managed to do it by iterating over all rows, and it works OK for smaller datasets, but starts to cause issues as the number of rows and columns is increasing.
I was wondering, is there a more effective way for accomplishing this with pandas?
Example of my current solution:
data = np.array([['Identifier','N1','N2','N3','N4','mean'],
['Row1',1,2,3,4,2.5],
['Row2',5,4,3,2,3.5],
['Row3',1,5,1,5,3],
['Row4',1,2,3,10,4]
])
df = pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:])
df.head()
result:
N1 N2 N3 N4 mean
Row1 1 2 3 4 2.5
Row2 5 4 3 2 3.5
Row3 1 5 1 5 3
Row4 1 2 3 10 4
To turn this into a boolean dataframe, I do the following:
# new dataframe with same structure
df_bools = pd.DataFrame().reindex_like(df)
df_bools["mean"] = df["mean"]
# iterate over row values
for index,row in df.iterrows():
colcnt = 0
for i in row[0:-1]:
df_bools.iloc[df.index.get_loc(index),colcnt] = (i>row["mean"])
colcnt += 1
df_bools.head()
and the desired result:
N1 N2 N3 N4 mean
Row1 False False True True 2.5
Row2 True True False False 3.5
Row3 False True False True 3
Row4 False False False False 4
Upvotes: 1
Views: 38
Reputation: 323226
IIUC
df.iloc[:,:4]=df.iloc[:,:4].gt(df['mean'],0)
df
Out[1015]:
N1 N2 N3 N4 mean
Row1 False False True True 2.5
Row2 True True False False 3.5
Row3 False True False True 3
Row4 False False False False 4
Upvotes: 1