Reputation: 2365
data = {
'node1': [1, 1,1, 2,2,5],
'node2': [8,16,22,5,25,10],
'weight': [1,1,1,1,1,1], }
df = pd.DataFrame(data, columns = ['node1','node2','weight'])
df2=df.assign(Cu=df.groupby('node1').cumcount()).set_index('Cu').groupby('node1') \
.apply(lambda x : x['node2']).unstack('Cu').fillna(np.nan)
Output:
1 8.0 16.0 22.0
2 5.0 25.0 0.0
5 10.0 0.0 0.0
This the output I am gettting but I require the output:
1 8 16 22
2 5 25 0
3 0 0 0
4 0 0 0
5 10 0 0
The rows which are missing in the data like the 3,4 should have the columns as zeros
Upvotes: 0
Views: 994
Reputation: 76967
Here are few ways of doing it.
Option 1
In [36]: idx = np.arange(df.node1.min(), df.node1.max()+1)
In [37]: df.groupby('node1')['node2'].apply(list).apply(pd.Series).reindex(idx).fillna(0)
Out[37]:
0 1 2
node1
1 8.0 16.0 22.0
2 5.0 25.0 0.0
3 0.0 0.0 0.0
4 0.0 0.0 0.0
5 10.0 0.0 0.0
Option 2
In [39]: (df.groupby('node1')['node2'].apply(lambda x: pd.Series(x.values))
.unstack().reindex(idx).fillna(0))
Out[39]:
0 1 2
node1
1 8.0 16.0 22.0
2 5.0 25.0 0.0
3 0.0 0.0 0.0
4 0.0 0.0 0.0
5 10.0 0.0 0.0
Option 3
In [55]: pd.DataFrame.from_dict(
{i: x.values for i, x in df.groupby('node1')['node2']},
orient='index').reindex(idx).fillna(0)
Out[55]:
0 1 2
1 8.0 16.0 22.0
2 5.0 25.0 0.0
3 0.0 0.0 0.0
4 0.0 0.0 0.0
5 10.0 0.0 0.0
And, measure the efficiency, readability based on your usecase.
Upvotes: 4
Reputation: 210922
In [15]: idx = np.arange(df.node1.min(), df.node1.max()+1)
In [16]: df.pivot_table(index='node1',
columns=df.groupby('node1').cumcount(),
values='node2',
fill_value=0) \
.reindex(idx) \
.fillna(0)
Out[16]:
0 1 2
node1
1 8.0 16.0 22.0
2 5.0 25.0 0.0
3 0.0 0.0 0.0
4 0.0 0.0 0.0
5 10.0 0.0 0.0
Upvotes: 3