Reputation: 3288
I'm new to pandas and working with indices, especially MultiIndex. I have a DataFrame as such:
df = pd.DataFrame({
'ID':[1,2,1,2],
'Measurement': ['ScanA', 'ScanA', 'ScanB', 'ScanB'],
'Result':[0.1,0.2,0.5,0.7],
'ResultType':['A','B','C','B']})
piv = df.pivot(index = 'ID', columns = 'Measurement', values = ['Result', 'ResultType'])
This creates two indices Result
and Type
, but I'd like to modify the values in the Result
index based on the Type
value.
For example, if Type == 'C'
then I want the corresponding Result
to be -1
.
Another example would be if Type in ('A', 'B')
then I want to set Result
to be 0 if < 0.5 else 1
How can I programatically do this without looping through each row/column?
Output for piv.Result
would look like:
Measurement ScanA ScanB
ID
1 0.0 -1
2 0.0 1
Upvotes: 1
Views: 139
Reputation: 46
There is a method in the pandas
library called loc()
which allows the user to find the values of the row through calling its label. This function, along with the conditional selection offered by pandas, allows the user to find the row where certain conditions are true (similar to the WHERE
functionality found in SQL) and set values for other columns in that row. Using these methods, I have added a portion to your code that sets the 'Result'
column to -1
when 'Type' == 'A'
. You can apply the same structure to alter other 'Result'
rows based on 'Type'
values.
df = pd.DataFrame({
'ID':[1,2,1,2],
'Measurement': ['ScanA', 'ScanA', 'ScanB', 'ScanB'],
'Result':[0.1,0.2,0.5,0.7],
'Type':['A','B','A','C']})
piv = df.pivot(index = 'ID', columns = 'Measurement', values = ['Result', 'Type'])
df.loc[df['Type'] == 'C', ['Result']] = -1
print(df)
Upvotes: 0
Reputation: 38415
You can first modify the dataframe and then pivot
df.loc[df['Type'] == 'C', 'Result'] = -1
df.loc[(df['Type'].isin(['A', 'B'])) & (df['Result'] < 0.5), 'Result'] = 0
df.loc[(df['Type'].isin(['A', 'B'])) & (df['Result'] >= 0.5), 'Result'] = 1
df.pivot(index = 'ID', columns = 'Measurement', values = ['Result', 'Type'])
Result Type
Measurement ScanA ScanB ScanA ScanB
ID
1 0 1 A A
2 0 -1 B C
Edit: An alternative using np.select as @Wen-Ben suggested
cond = [df['Type'] == 'C', (df['Type'].isin(['A', 'B'])) & (df['Result'] < 0.5), (df['Type'].isin(['A', 'B'])) & (df['Result'] >= 0.5)]
choice = [-1, 0, 1]
df['Result'] = np.select(cond, choice)
df.pivot(index = 'ID', columns = 'Measurement', values = ['Result', 'Type'])
Upvotes: 3
Reputation: 402263
As an alternative to what Vaishali suggested, you can use numpy.where
to modify piv
after it is generated.
t = piv['Type']
r = piv['Result'].astype(float)
piv.loc[:, 'Result'] = np.where(
t == 'A', np.where(r < 0.5, 0, 1), np.where(t == 'C', -1, r))
piv
Result Type
Measurement ScanA ScanB ScanA ScanB
ID
1 0.0 1.0 A A
2 0.2 -1.0 B C
Upvotes: 3