Reputation: 2426
I have a DataFrame and I want to merge the rows that contain same values
toy = [
[10, 11],
[21, 22],
[11, 15],
[22, 23],
[15, 33]
]
toy = pd.DataFrame(toy, columns = ['ID1', 'ID2'])
ID1 ID2
0 10 11
1 21 22
2 11 15
3 22 23
4 15 33
What I am hoping to get afterwards is
0 1 2 3
0 10 11 15 33.0
1 21 22 23 NaN
So merging rows that contain any same value within.
My solution is super NOT elegant, I am seeking for the right way to do this... Recursion? Groupby? Hmm..
#### Feel Free to NOT read this... ###
for k in range(100):
print(k)
merge_df = []
merged_indices = []
for i, row in toy.iterrows():
if i in merged_indices:
continue
cp = toy.copy()
merge_rows = cp[cp.isin(row.values)].dropna(how="all")
merged_indices = merged_indices + list(merge_rows.index)
merge_rows = np.array(toy.iloc[merge_rows.index]).flatten()
merge_rows = np.unique(merge_rows)
merge_df.append(merge_rows)
if toy.shape[0] == len(merge_df):
break
toy = pd.DataFrame(merge_df).copy()
Upvotes: 4
Views: 181
Reputation: 323396
Sounds like a network problems so I using networkx
import networkx as nx
G=nx.from_pandas_edgelist(toy, 'ID1', 'ID2')
l=list(nx.connected_components(G))
newdf=pd.DataFrame(l)
newdf
Out[896]:
0 1 2 3
0 33 10 11 15.0
1 21 22 23 NaN
Upvotes: 2