Constantino
Constantino

Reputation: 2391

pandas merge dataframe and pivot creating new columns

I've got two input dataframes

df1 (note, this DF could have more columns of data)

   Sample Animal  Time     Sex
0       1      A   one    male
1       2      A   two    male
2       3      B   one  female
3       4      C   one    male
4       5      D   one  female

and df2

          a    b    c
Sample               
1       0.2  0.4  0.3
2       0.5  0.7  0.2
3       0.4  0.1  0.9
4       0.4  0.2  0.3
5       0.6  0.2  0.4

and I'd like to combine them so that I get the following:

        one_a  one_b  one_c  two_a  two_b  two_c     Sex
Animal                                                  
A         0.2    0.4    0.3    0.5    0.7    0.2    male
B         0.4    0.1    0.9    NaN    NaN    NaN  female
C         0.4    0.2    0.3    NaN    NaN    NaN    male
D         0.6    0.2    0.4    NaN    NaN    NaN  female

This is how I'm doing things:

df2.reset_index(inplace = True)
df3 = pd.melt(df2, id_vars=['Sample'], value_vars=list(cols))
df4 = pd.merge(df3, df1, on='Sample')
df4['moo'] = df4['Group'] + '_' + df4['variable']
df5 = pd.pivot_table(df4, values='value', index='Animal', columns='moo')
df6 = df1.groupby('Animal').agg('first')
pd.concat([df5, df6], axis=1).drop('Sample',1).drop('Group',1)

This works just fine, but could potentially be slow for large datasets. I'm wondering if any panda-pros see a better (read faster, more efficient)? I'm new to pandas and can imagine there are some shortcuts here that I don't know about.

Upvotes: 1

Views: 1864

Answers (1)

Jianxun Li
Jianxun Li

Reputation: 24742

A few steps here. The key is that in order to create columns like one_a one_b .... two_c, we need add Time column to Sample index to build a multi-level index and then unstack to get the required form. Then, a groupby on Animal index is required to aggregate and reduce the number of NaNs. The rest are just some manipulations on format.

import pandas as pd

# your data
# ==============================
# set index
df1 = df1.set_index('Sample')

print(df1)

       Animal Time     Sex
Sample                    
1           A  one    male
2           A  two    male
3           B  one  female
4           C  one    male
5           D  one  female

print(df2)


          a    b    c
Sample               
1       0.2  0.4  0.3
2       0.5  0.7  0.2
3       0.4  0.1  0.9
4       0.4  0.2  0.3
5       0.6  0.2  0.4



# processing
# =============================
df = df1.join(df2)

df_temp = df.set_index(['Animal', 'Sex','Time'], append=True).unstack()

print(df_temp)


                        a         b         c     
Time                  one  two  one  two  one  two
Sample Animal Sex                                 
1      A      male    0.2  NaN  0.4  NaN  0.3  NaN
2      A      male    NaN  0.5  NaN  0.7  NaN  0.2
3      B      female  0.4  NaN  0.1  NaN  0.9  NaN
4      C      male    0.4  NaN  0.2  NaN  0.3  NaN
5      D      female  0.6  NaN  0.2  NaN  0.4  NaN

# rename the columns if you wish
df_temp.columns = ['{}_{}'.format(x, y) for x, y in zip(df_temp.columns.get_level_values(1), df_temp.columns.get_level_values(0))]

print(df_temp)

                      one_a  two_a  one_b  two_b  one_c  two_c
Sample Animal Sex                                             
1      A      male      0.2    NaN    0.4    NaN    0.3    NaN
2      A      male      NaN    0.5    NaN    0.7    NaN    0.2
3      B      female    0.4    NaN    0.1    NaN    0.9    NaN
4      C      male      0.4    NaN    0.2    NaN    0.3    NaN
5      D      female    0.6    NaN    0.2    NaN    0.4    NaN


result = df_temp.reset_index('Sex').groupby(level='Animal').agg(max).sort_index(axis=1)

print(result)

           Sex  one_a  one_b  one_c  two_a  two_b  two_c
Animal                                                  
A         male    0.2    0.4    0.3    0.5    0.7    0.2
B       female    0.4    0.1    0.9    NaN    NaN    NaN
C         male    0.4    0.2    0.3    NaN    NaN    NaN
D       female    0.6    0.2    0.4    NaN    NaN    NaN

Upvotes: 1

Related Questions