Reputation: 39
I have a set of data with 5 cells and some rows containing the same value in row[0], row[1] and row[3].
for example:
1 2 3 4 5
A AB AD AC AE
R c W E F
A AB B AC D
...
I want to merge the rows that has the same value in row[0] by adding the unique values together. So the output after processing should be something like:
1 2 3 4 5
A AB ADB AC AED
R c W E F
...
I am trying to do this in python only, by setting a dictionary with row[0] as the key, but I could not figure out a way to loop over the other rows and append them. I think the code should look something like:
result = {}
with open('mydata.csv', 'r', encoding='utf-8') as f:
reader = csv.reader(f)
l = [ l for l in reader]
for row in l:
idx = row[0]
values = row[1:]
if idx in result:
result[idx] = [values[0], values[1], [loop and append], values[3], [loop and append]]
print(result[idx])
else:
result[idx] = values
Upvotes: 0
Views: 47
Reputation: 29732
Use pandas.DataFrame.groupby
with agg
and pd.Series.unique
:
new_df = df.groupby('1').agg(lambda x: ''.join(x.unique())).reset_index()
print(new_df)
Output:
1 2 3 4 5
0 A AB ADB AC AED
1 R c W E F
Upvotes: 1