Filling the missing rows in pandas dataframe

Question

data = { 
    'node1': [1, 1,1, 2,2,5],
 'node2': [8,16,22,5,25,10],
 'weight': [1,1,1,1,1,1], }
df = pd.DataFrame(data, columns = ['node1','node2','weight'])

df2=df.assign(Cu=df.groupby('node1').cumcount()).set_index('Cu').groupby('node1') \
      .apply(lambda x : x['node2']).unstack('Cu').fillna(np.nan)

Output:

1       8.0    16.0     22.0
2       5.0    25.0     0.0
5       10.0   0.0      0.0

This the output I am gettting but I require the output:

1  8   16  22    
2  5   25  0    
3  0   0   0    
4  0   0   0    
5  10  0   0

The rows which are missing in the data like the 3,4 should have the columns as zeros

Zero · Accepted Answer

Here are few ways of doing it.

Option 1

In [36]: idx = np.arange(df.node1.min(), df.node1.max()+1)

In [37]: df.groupby('node1')['node2'].apply(list).apply(pd.Series).reindex(idx).fillna(0)
Out[37]:
          0     1     2
node1
1       8.0  16.0  22.0
2       5.0  25.0   0.0
3       0.0   0.0   0.0
4       0.0   0.0   0.0
5      10.0   0.0   0.0

Option 2

In [39]: (df.groupby('node1')['node2'].apply(lambda x: pd.Series(x.values))
            .unstack().reindex(idx).fillna(0))
Out[39]:
          0     1     2
node1
1       8.0  16.0  22.0
2       5.0  25.0   0.0
3       0.0   0.0   0.0
4       0.0   0.0   0.0
5      10.0   0.0   0.0

Option 3

In [55]: pd.DataFrame.from_dict(
               {i: x.values  for i, x in df.groupby('node1')['node2']},
                orient='index').reindex(idx).fillna(0)
Out[55]:
      0     1     2
1   8.0  16.0  22.0
2   5.0  25.0   0.0
3   0.0   0.0   0.0
4   0.0   0.0   0.0
5  10.0   0.0   0.0

And, measure the efficiency, readability based on your usecase.

Filling the missing rows in pandas dataframe

Answers (2)

Related Questions