yosuef alotaibi
yosuef alotaibi

Reputation: 11

how to swap two columns and flip a third in panda data frame?

I'm conducting an experiment(using python 2.7, panda 0.23.4) where I have three levels of a stimulus {a,b,c} and present all different combinations to participants, and they have to choose which one was rougher? (example: Stimulus 1 = a , Stimulus 2=b, participant choose 1 indicating stimulus 1 was rougher)

After the experiment, I have a data frame with three columns like this:

import pandas as pd

d = {'Stim1':  ['a', 'b', 'a', 'c', 'b', 'c'],
     'Stim2': ['b', 'a', 'c', 'a', 'c', 'b'],
     'Answer': [1, 2, 2, 1, 2, 1]}
df = pd.DataFrame(d)

        Stim1  Stim2  Answer
    0     a     b      1
    1     b     a      2
    2     a     c      2
    3     c     a      1
    4     b     c      2
    5     c     b      1 

For my analysis, the order of which stimulus came first doesn't matter. Stim1= a, Stim2= b is the same as Stim1= b, Stim2= a. I'm trying to figure out how can I swap Stim1 and Stim2 and flip their Answer to be like this:

        Stim1  Stim2  Answer
    0     a     b      1
    1     a     b      1
    2     a     c      2
    3     a     c      2
    4     b     c      2
    5     b     c      2

I read that np.where can be used, but it would do one thing at a time, where I want to do two (swap and flip).

Is there some way to use another function to do swap and flip at the same time?

Upvotes: 1

Views: 192

Answers (2)

Cimbali
Cimbali

Reputation: 11395

You can start by building a boolean series that indicates which rows should be swapped or not:

>>> swap = df['Stim1'] > df['Stim2']
>>> swap
0    False
1     True
2    False
3     True
4    False
5     True
dtype: bool

Then build the fully swapped dataframe as follows:

>>> swapped_df = pd.concat([
...     df['Stim1'].rename('Stim2'),
...     df['Stim2'].rename('Stim1'),
...     3 - df['Answer'],
... ], axis='columns')
>>> swapped_df
  Stim2 Stim1  Answer
0     a     b       2
1     b     a       1
2     a     c       1
3     c     a       2
4     b     c       1
5     c     b       2

Finally, use .mask() to select initial rows or swapped rows:

>>> df.mask(swap, swapped_df)
  Stim1 Stim2  Answer
0     a     b       1
1     a     b       1
2     a     c       2
3     a     c       2
4     b     c       2
5     b     c       2

NB .mask is roughly the same as .where, but it replaces rows where the series is True instead of keeping the rows that are True. This is exactly the same:

>>> swapped_df.where(swap, df)
  Stim2 Stim1  Answer
0     b     a       1
1     b     a       1
2     c     a       2
3     c     a       2
4     c     b       2
5     c     b       2

Upvotes: 2

mozway
mozway

Reputation: 260580

Can you try if this works for you?

import pandas as pd
import numpy as np

df = pd.DataFrame(d)

# keep a copy of the original Stim1 column
s = df['Stim1'].copy()

# sort the values
df[['Stim1', 'Stim2']] = np.sort(df[['Stim1', 'Stim2']].values)

# exchange the Answer if the order has changed
df['Answer'] = df['Answer'].where(df['Stim1'] == s, df['Answer'].replace({1:2,2:1}))

output:

  Stim1 Stim2  Answer
0     a     b       1
1     a     b       1
2     a     c       2
3     a     c       2
4     b     c       2
5     b     c       2

Upvotes: 2

Related Questions