kkjoe
kkjoe

Reputation: 795

Python/Pandas transfer the format of dataframe

I have a python/pandas dataframe has two columns: (SHEET column is list)

    VNAME   SHEET
0   atnpi   [HSP, HHA, HO_, INP]
1   atupi   [HSP, HHA, INP]
2   carnm   [HB_]
3   clmid   [HSP, HHA, HB_, HO_, INP]
4   clseq   [HSP, HHA, HO_, INP]

I want to transfer it to a dataframe like

    VNAME   HSP    HHA   HB_   HO_  INP 
0   atnpi   TRUE   TRUE        TRUE TRUE
1   atupi   TRUE   TRUE             TRUE
2   carnm                TRUE
3   clmid   TRUE   TRUE  TRUE  TRUE TRUE
4   clseq   TRUE   TRUE        TRUE TRUE

Upvotes: 1

Views: 86

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

One way you can do this is as follows:

df1 = df1.set_index('VNAME')

(df1['SHEET'].apply(pd.Series).stack()
            .reset_index(1, drop=True)
            .to_frame().assign(key=True)
            .set_index(0,append=True)['key']
            .unstack()
            .fillna('')
            .rename_axis(None,1)
            .reset_index())

Output:

   VNAME   HB_   HHA   HO_   HSP   INP
0  atnpi        True  True  True  True
1  atupi        True        True  True
2  carnm  True                        
3  clmid  True  True  True  True  True
4  clseq        True  True  True  True

OR to get string 'TRUE'

(df1['SHEET'].apply(pd.Series).stack()
                  .reset_index(1, drop=True)
                  .to_frame().assign(key='TRUE')
                  .set_index(0,append=True)['key']
                  .unstack().fillna('')
                  .rename_axis(None,1)
                  .reset_index())

Output:

   VNAME   HB_   HHA   HO_   HSP   INP
0  atnpi        TRUE  TRUE  TRUE  TRUE
1  atupi        TRUE        TRUE  TRUE
2  carnm  TRUE                        
3  clmid  TRUE  TRUE  TRUE  TRUE  TRUE
4  clseq        TRUE  TRUE  TRUE  TRUE

Option 2

df1 = df1.set_index('VNAME')
df2 = df1['SHEET'].apply(pd.Series).stack().reset_index(name='SHEET').drop('level_1', axis=1)
pd.crosstab(df2.VNAME, df2.SHEET).astype(bool).replace(False,'').reset_index().rename_axis(None,1)

Output:

   VNAME   HB_   HHA   HO_   HSP   INP
0  atnpi        True  True  True  True
1  atupi        True        True  True
2  carnm  True                        
3  clmid  True  True  True  True  True
4  clseq        True  True  True  True

Upvotes: 2

Related Questions