user14784453
user14784453

Reputation:

merge the dataframe based on condition

I given the sample of data-frame

  Name value  A   Msg_A       Invalid_A     Msg_B   Invalid_B       Msg_C        Invalid_C 

0  j    3     A  Not VAid XYZ   True        VAlid       False     Not Valid_c     True   
1  B    4     B  VAid ABC       False       notVAlidb   TRue      Not Valid_c     True

i want df like this

   Name value A    Msg_A      Invalid_A   Msg_B    Invalid_B       Msg_C     Invalid_C  Message

0  J     3    A  Not VAid XYZ   True     VAlid       False     Not Valid_c     True  Not VAid XYZ,Not Valid_c    
1  B     4    B  VAid ABC      False     notVAlidb   TRue      Not Valid_c     True   notVAlidb,Not Valid_c

i want to merge the msg_ in message column in if invalid column contains True

shape : print(df.filter(like='Invalid_').values[:,None].shape) --> (4, 1, 8) i try with

df['message']=(df.filter(like='Status_')
 .where(df.filter(like='Invalid_').values[:,None].reshape(4,8))
 .apply(lambda x:','.join(x.dropna()),1))

Upvotes: 1

Views: 60

Answers (1)

jezrael
jezrael

Reputation: 863291

Use wide_to_long for reshape, then filter rows if Invalid is True and last aggregate join:

df1 = (pd.wide_to_long(df.reset_index(), 
                      stubnames=['Msg','Invalid'], 
                      sep='_', 
                      suffix='\w+', 
                      i=['index','Name','value','A'], 
                      j='tmp')
         .reset_index())

df1 = df1[df1['Invalid']]

df['message'] = df1.groupby('index')['Msg'].agg(', '.join)
print (df)
  Name  value  A         Msg_A  Invalid_A      Msg_B  Invalid_B        Msg_C  \
0    j      3  A  Not VAid XYZ       True      VAlid      False  Not Valid_c   
1    B      4  B      VAid ABC      False  notVAlidb       True  Not Valid_c   

   Invalid_C                    message  
0       True  Not VAid XYZ, Not Valid_c  
1       True     notVAlidb, Not Valid_c  

Yopu can generate all columns not starting by Msg or Invalid to parameter i:

cols = df.columns[~df.columns.str.startswith(('Msg','Invalid'))].tolist()
                                
df1 = (pd.wide_to_long(df.reset_index(), 
                      stubnames=['Msg','Invalid'], 
                      sep='_', 
                      suffix='\w+', 
                      i=['index'] + cols, 
                      j='tmp')
         .reset_index())

df1 = df1[df1['Invalid']]

df['message'] = df1.groupby('index')['Msg'].agg(', '.join)

Your solution:

There is necessary same number of columns between both df1 and df2, so possible set df2.columns by df1.columns:

df1 = df.filter(like='Msg_')
df2 = df.filter(like='Invalid_')
#set columnsnames
df2.columns = df1.columns

df['message'] = df1.where(df2).apply(lambda x:','.join(x.dropna()),1)
print (df)
  Name  value  A         Msg_A  Invalid_A      Msg_B  Invalid_B        Msg_C  \
0    j      3  A  Not VAid XYZ       True      VAlid      False  Not Valid_c   
1    B      4  B      VAid ABC      False  notVAlidb       True  Not Valid_c   

   Invalid_C                   message  
0       True  Not VAid XYZ,Not Valid_c  
1       True     notVAlidb,Not Valid_c  

Upvotes: 1

Related Questions