estemendoza
estemendoza

Reputation: 3085

Group values and merge on Pandas with Python

I have the following data on a csv:

c1    c2
1     2
1     3
3     4
3     5
4     6

What I need to do is to process that data and have the following output:

c1    c2
1     2,3
3     4,5
4     6

Is this possible with pandas?

Thanks

Upvotes: 1

Views: 57

Answers (1)

jezrael
jezrael

Reputation: 863801

You can first convert column c2 to string by astype and then groupby with apply join. Last reset_index:

df['c2'] = df['c2'].astype(str)
print df.groupby('c1')['c2'].apply(','.join).reset_index()
   c1   c2
0   1  2,3
1   3  4,5
2   4    6

If you need drop_duplicates:

print df
   c1  c2
0   1   2
1   1   3
2   1   2
3   1   3
4   3   4
5   3   5
6   4   6

df['c2'] = df['c2'].astype(str)
df = df.groupby('c1')['c2'].apply(lambda x: ','.join(x.drop_duplicates())).reset_index()
print df
   c1   c2
0   1  2,3
1   3  4,5
2   4    6

If you need sort DataFrame by length of values in column c2, use str.len and sort_values. Last you can drop column sort:

print df
   c1  c2
0   1   4
1   1   5
2   4   6
3   2   7
4   2   3
5   2   2
6   2   3

df['c2'] = df['c2'].astype(str)
df = df.groupby('c1')['c2'].apply(lambda x: ','.join(x.drop_duplicates())).reset_index()

df['sort'] = df['c2'].str.len()
df = df.sort_values('sort')
df = df.drop('sort',axis=1)
print df
   c1     c2
2   4      6
0   1    4,5
1   2  7,3,2

print df.reset_index(drop=True)
   c1     c2
0   4      6
1   1    4,5
2   2  7,3,2

Upvotes: 5

Related Questions