Reputation: 1460
How do I ignore null and None values in a data frame based on ID and frame the data.
id A B C
A [] [] []
A [aaaa] None []
A [] [bbbb] None
A [] [] [ccccc]
A None [] []
B [] [] [zzzz]
B [] None []
B [xxxx] None None
B [] [] []
B none [yyyy] None
can we re arrange the data set based on ID.
Output:
id A B C
A aaa bbb ccc
B xxx yyy zzzz
Upvotes: 0
Views: 1251
Reputation: 862511
If there are None
values like NoneType
and lists in all another columns like id
, then create index by id
, get first values of lists by indexing with str[0]
, replace None
s to NaN
s and last aggregate GroupBy.first
:
print (df.applymap(type))
id A B C
0 <class 'int'> <class 'list'> <class 'list'> <class 'list'>
1 <class 'int'> <class 'list'> <class 'NoneType'> <class 'list'>
2 <class 'int'> <class 'list'> <class 'list'> <class 'NoneType'>
3 <class 'int'> <class 'list'> <class 'list'> <class 'list'>
4 <class 'int'> <class 'NoneType'> <class 'list'> <class 'list'>
5 <class 'int'> <class 'list'> <class 'list'> <class 'list'>
6 <class 'int'> <class 'list'> <class 'NoneType'> <class 'list'>
7 <class 'int'> <class 'list'> <class 'NoneType'> <class 'NoneType'>
8 <class 'int'> <class 'list'> <class 'list'> <class 'list'>
9 <class 'int'> <class 'NoneType'> <class 'list'> <class 'NoneType'>
df1 = (df.set_index('id')
.apply(lambda x: x.str[0]).mask(lambda x: x.isna(), np.nan)
.groupby('id')
.first())
print (df1)
A B C
id
1 aaaa bbbb ccccc
2 xxxx yyyy zzzz
Another idea:
df1 = (df.set_index('id')
.applymap(lambda x: np.nan if x == [] else x)
.stack()
.unstack()
.apply(lambda x: x.str[0])
)
print (df1)
A B C
id
1 aaaa bbbb ccccc
2 xxxx yyyy zzzz
Upvotes: 1