TomNash
TomNash

Reputation: 3288

Modify values in a DataFrame index based on another index value

I'm new to pandas and working with indices, especially MultiIndex. I have a DataFrame as such:

df = pd.DataFrame({
    'ID':[1,2,1,2],
    'Measurement': ['ScanA', 'ScanA', 'ScanB', 'ScanB'],
    'Result':[0.1,0.2,0.5,0.7],
    'ResultType':['A','B','C','B']})
piv = df.pivot(index = 'ID', columns = 'Measurement', values = ['Result', 'ResultType'])

This creates two indices Result and Type, but I'd like to modify the values in the Result index based on the Type value.

For example, if Type == 'C' then I want the corresponding Result to be -1.

Another example would be if Type in ('A', 'B') then I want to set Result to be 0 if < 0.5 else 1

How can I programatically do this without looping through each row/column?

Output for piv.Result would look like:

Measurement     ScanA   ScanB
ID      
1               0.0     -1
2               0.0     1

Upvotes: 1

Views: 139

Answers (3)

Daniel Weber
Daniel Weber

Reputation: 46

There is a method in the pandas library called loc() which allows the user to find the values of the row through calling its label. This function, along with the conditional selection offered by pandas, allows the user to find the row where certain conditions are true (similar to the WHERE functionality found in SQL) and set values for other columns in that row. Using these methods, I have added a portion to your code that sets the 'Result' column to -1 when 'Type' == 'A'. You can apply the same structure to alter other 'Result' rows based on 'Type' values.

df = pd.DataFrame({
    'ID':[1,2,1,2],
    'Measurement': ['ScanA', 'ScanA', 'ScanB', 'ScanB'],
    'Result':[0.1,0.2,0.5,0.7],
    'Type':['A','B','A','C']})
piv = df.pivot(index = 'ID', columns = 'Measurement', values = ['Result', 'Type'])
df.loc[df['Type'] == 'C', ['Result']] = -1
print(df)

Upvotes: 0

Vaishali
Vaishali

Reputation: 38415

You can first modify the dataframe and then pivot

df.loc[df['Type'] == 'C', 'Result'] = -1
df.loc[(df['Type'].isin(['A', 'B'])) & (df['Result'] < 0.5), 'Result'] = 0
df.loc[(df['Type'].isin(['A', 'B'])) & (df['Result'] >= 0.5), 'Result'] = 1

df.pivot(index = 'ID', columns = 'Measurement', values = ['Result', 'Type'])

            Result          Type
Measurement ScanA   ScanB   ScanA   ScanB
ID              
1           0       1       A       A
2           0       -1      B       C

Edit: An alternative using np.select as @Wen-Ben suggested

cond = [df['Type'] == 'C', (df['Type'].isin(['A', 'B'])) & (df['Result'] < 0.5), (df['Type'].isin(['A', 'B'])) & (df['Result'] >= 0.5)]
choice = [-1, 0, 1]
df['Result'] = np.select(cond, choice)

df.pivot(index = 'ID', columns = 'Measurement', values = ['Result', 'Type'])

Upvotes: 3

cs95
cs95

Reputation: 402263

As an alternative to what Vaishali suggested, you can use numpy.where to modify piv after it is generated.

t = piv['Type']  
r = piv['Result'].astype(float)

piv.loc[:, 'Result'] = np.where(
    t == 'A', np.where(r < 0.5, 0, 1), np.where(t == 'C', -1, r))
piv

            Result        Type      
Measurement  ScanA ScanB ScanA ScanB
ID                                  
1              0.0   1.0     A     A
2              0.2  -1.0     B     C

Upvotes: 3

Related Questions