kbk
kbk

Reputation: 605

python-pandas merging the cell/data frame content

i was trying to format the input data to form a format as mentioned below, i am not sure how to achieve how to do it as i am new to pandas

Input data in xlsx format:

enter image description here

which is parsed to form below data format

import pandas as pd

excel_file = 'sample.xlsx'
ip_data_read = pd.read_excel(excel_file)
ip_data_read["Test Case(ID / Brief Description)"] = ip_data_read["Test Case(ID / Brief Description)"].ffill()

out put print like this:

Test Case(ID / Brief Description)       Action / Step           Expected Result (Description)
       TC_001                                a                             b
       TC_001                                c                             d
       TC_001                                e                           NaN
       TC_001                                f                           NaN
       TC_001                              NaN                             g
       TC_002                              a11                        qwerty
       TC_002                              qqq                        mnbghd
       TC_002                              NaN                        cccc12
       TC_002                         qwwqwq33                           NaN
       TC_003                               a1                            b1
       TC_004                              NaN                            c2
       TC_005                              NaN                            d2
       TC_006                              NaN                            s2
       TC_007                               a3                            d4
       TC_099                          abcdefg                           NaN
       TC_100                                a                             b
       TC_100                                c                             d
       TC_100                                e                           NaN
       TC_100                                f                           NaN
       TC_100                              NaN                             g
       TC_101                               xx                            yy
       TC_101                               er                            gf
       TC_101                               vv                             i

i wanted the dataframe to be something like,

Test Case(ID / Brief Description)       Steps           
       TC_001                            a,b            
                                         c,d            
                                         e              
                                         f              
                                         g               
       TC_002                         a11,qwerty                
       TC_002                         qqq,mnbghd                
       TC_002                         cccc12                    
       TC_002                         qwwqwq33   

       TC_003                          a1,b1                     
       TC_004                          c2                         
       TC_005                          d2                          
       TC_006                          s2                         
       TC_007                          a3,d4                     
       TC_099                          abcdefg  

       TC_100                          a,b            
                                       c,d            
                                       e              
                                       f              
                                        g
       TC_101                          xx ,yy                         
                                       er ,gf                         
                                       vv ,i    

for example the value/steps associated with TC_001 should give me a string like

   a,b            
     c,d            
     e              
     f              
     g 

Upvotes: 1

Views: 55

Answers (1)

jezrael
jezrael

Reputation: 862396

Use Series.str.cat with replace missing values:

s = df.pop('Expected Result (Description)').fillna('')
df['Steps'] = df.pop('Action / Step').fillna('').str.cat(s, sep=',').str.strip(',')

If there is many columns:

df1 = (df.set_index('Test Case(ID / Brief Description)')
        .apply(lambda x: ','.join(x.dropna()), axis=1)
        .reset_index())

Sample:

df = pd.DataFrame({'Test Case(ID / Brief Description)':range(4),
                   'Action / Step':[np.nan, np.nan,'a', 'd'],
                   'Expected Result (Description)':['s', np.nan, np.nan,'m']})
print (df)
   Test Case(ID / Brief Description) Action / Step  \
0                                  0           NaN   
1                                  1           NaN   
2                                  2             a   
3                                  3             d   

  Expected Result (Description)  
0                             s  
1                           NaN  
2                           NaN  
3                             m  

df1 = (df.set_index('Test Case(ID / Brief Description)')
        .apply(lambda x: ','.join(x.dropna()), axis=1)
        .reset_index(name='Steps'))
print (df1)
   Test Case(ID / Brief Description) Steps
0                                  0     s
1                                  1      
2                                  2     a
3                                  3   d,m

Upvotes: 1

Related Questions