PaulFaguet
PaulFaguet

Reputation: 13

how to merge rows of a df with same value

I am facing a problem using pandas on python and i can't solve it. I would like to merge/combine/regroup the rows which have the same url.

EDIT : I have a dataframe looking like this :

url col1 col2 col3 col4
aaa xx yy
bbb zz
aaa ee
AA

I would like something like this :

url col1 col2 col3 col4
aaa ee xx yy
bbb zz cc
AA

I've tried using groupby, but in my df i've datas which don't have URL and i want to keep them. I've also tried merge with inner, which gives me pretty good results but i don't know why it decuplates the number of rows inside my df.

thank you.

Upvotes: 1

Views: 602

Answers (3)

René
René

Reputation: 4827

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'url': ['url1', 'url2'], 'col1':['A', np.nan], 'col2':[np.nan, 'B']}).set_index('url')
df2 = pd.DataFrame({'url': ['url1', 'url2'], 'col1':[np.nan, 'C'], 'col2':['D', np.nan]}).set_index('url')
df1.fillna(df2, inplace=True)
print(df1)

Result:

     col1 col2
url           
url1    A    D
url2    C    B

Upvotes: 0

Mohammad Nuseirat
Mohammad Nuseirat

Reputation: 46

I think you should use groupby, nunique, and np.where to solve this issue. See the following discussion regarding this problem. pandas-dataframe-check-if-multiple-rows-have-the-same-value

Upvotes: 0

Emma
Emma

Reputation: 9308

You can use groupby and first.

df = df.groupby('url', as_index=False).first()

Upvotes: 1

Related Questions