pylearner
pylearner

Reputation: 1460

How to Ignore Null values in data frame and build a new data frame based on a column

How do I ignore null and None values in a data frame based on ID and frame the data.

id  A         B     C
A   []        []    []
A   [aaaa]   None   []
A   []      [bbbb]  None
A   []        []    [ccccc]
A   None      []    []
B   []        []    [zzzz]
B   []       None   []
B   [xxxx]   None   None
B   []        []    []
B   none    [yyyy]  None

can we re arrange the data set based on ID.

Output:

id  A   B   C
A   aaa bbb ccc
B   xxx yyy zzzz

Upvotes: 0

Views: 1251

Answers (1)

jezrael
jezrael

Reputation: 862511

If there are None values like NoneType and lists in all another columns like id, then create index by id, get first values of lists by indexing with str[0], replace Nones to NaNs and last aggregate GroupBy.first:

print (df.applymap(type))
              id                   A                   B                   C
0  <class 'int'>      <class 'list'>      <class 'list'>      <class 'list'>
1  <class 'int'>      <class 'list'>  <class 'NoneType'>      <class 'list'>
2  <class 'int'>      <class 'list'>      <class 'list'>  <class 'NoneType'>
3  <class 'int'>      <class 'list'>      <class 'list'>      <class 'list'>
4  <class 'int'>  <class 'NoneType'>      <class 'list'>      <class 'list'>
5  <class 'int'>      <class 'list'>      <class 'list'>      <class 'list'>
6  <class 'int'>      <class 'list'>  <class 'NoneType'>      <class 'list'>
7  <class 'int'>      <class 'list'>  <class 'NoneType'>  <class 'NoneType'>
8  <class 'int'>      <class 'list'>      <class 'list'>      <class 'list'>
9  <class 'int'>  <class 'NoneType'>      <class 'list'>  <class 'NoneType'>

df1 = (df.set_index('id')
         .apply(lambda x: x.str[0]).mask(lambda x: x.isna(), np.nan)
         .groupby('id')
         .first())
print (df1)
       A     B      C
id                   
1   aaaa  bbbb  ccccc
2   xxxx  yyyy   zzzz

Another idea:

df1 = (df.set_index('id')
         .applymap(lambda x: np.nan if x == [] else x)
         .stack()
         .unstack()
         .apply(lambda x: x.str[0])
       )
print (df1)
       A     B      C
id                   
1   aaaa  bbbb  ccccc
2   xxxx  yyyy   zzzz

Upvotes: 1

Related Questions