Reputation: 795
I have a python/pandas dataframe has two columns: (SHEET column is list)
VNAME SHEET
0 atnpi [HSP, HHA, HO_, INP]
1 atupi [HSP, HHA, INP]
2 carnm [HB_]
3 clmid [HSP, HHA, HB_, HO_, INP]
4 clseq [HSP, HHA, HO_, INP]
I want to transfer it to a dataframe like
VNAME HSP HHA HB_ HO_ INP
0 atnpi TRUE TRUE TRUE TRUE
1 atupi TRUE TRUE TRUE
2 carnm TRUE
3 clmid TRUE TRUE TRUE TRUE TRUE
4 clseq TRUE TRUE TRUE TRUE
Upvotes: 1
Views: 86
Reputation: 153460
One way you can do this is as follows:
df1 = df1.set_index('VNAME')
(df1['SHEET'].apply(pd.Series).stack()
.reset_index(1, drop=True)
.to_frame().assign(key=True)
.set_index(0,append=True)['key']
.unstack()
.fillna('')
.rename_axis(None,1)
.reset_index())
Output:
VNAME HB_ HHA HO_ HSP INP
0 atnpi True True True True
1 atupi True True True
2 carnm True
3 clmid True True True True True
4 clseq True True True True
OR to get string 'TRUE'
(df1['SHEET'].apply(pd.Series).stack()
.reset_index(1, drop=True)
.to_frame().assign(key='TRUE')
.set_index(0,append=True)['key']
.unstack().fillna('')
.rename_axis(None,1)
.reset_index())
Output:
VNAME HB_ HHA HO_ HSP INP
0 atnpi TRUE TRUE TRUE TRUE
1 atupi TRUE TRUE TRUE
2 carnm TRUE
3 clmid TRUE TRUE TRUE TRUE TRUE
4 clseq TRUE TRUE TRUE TRUE
Option 2
df1 = df1.set_index('VNAME')
df2 = df1['SHEET'].apply(pd.Series).stack().reset_index(name='SHEET').drop('level_1', axis=1)
pd.crosstab(df2.VNAME, df2.SHEET).astype(bool).replace(False,'').reset_index().rename_axis(None,1)
Output:
VNAME HB_ HHA HO_ HSP INP
0 atnpi True True True True
1 atupi True True True
2 carnm True
3 clmid True True True True True
4 clseq True True True True
Upvotes: 2