Reputation: 3085
I have the following data on a csv:
c1 c2
1 2
1 3
3 4
3 5
4 6
What I need to do is to process that data and have the following output:
c1 c2
1 2,3
3 4,5
4 6
Is this possible with pandas?
Thanks
Upvotes: 1
Views: 57
Reputation: 863801
You can first convert column c2
to string
by astype
and then groupby
with apply
join
. Last reset_index
:
df['c2'] = df['c2'].astype(str)
print df.groupby('c1')['c2'].apply(','.join).reset_index()
c1 c2
0 1 2,3
1 3 4,5
2 4 6
If you need drop_duplicates
:
print df
c1 c2
0 1 2
1 1 3
2 1 2
3 1 3
4 3 4
5 3 5
6 4 6
df['c2'] = df['c2'].astype(str)
df = df.groupby('c1')['c2'].apply(lambda x: ','.join(x.drop_duplicates())).reset_index()
print df
c1 c2
0 1 2,3
1 3 4,5
2 4 6
If you need sort DataFrame
by length of values in column c2
, use str.len
and sort_values
. Last you can drop
column sort
:
print df
c1 c2
0 1 4
1 1 5
2 4 6
3 2 7
4 2 3
5 2 2
6 2 3
df['c2'] = df['c2'].astype(str)
df = df.groupby('c1')['c2'].apply(lambda x: ','.join(x.drop_duplicates())).reset_index()
df['sort'] = df['c2'].str.len()
df = df.sort_values('sort')
df = df.drop('sort',axis=1)
print df
c1 c2
2 4 6
0 1 4,5
1 2 7,3,2
print df.reset_index(drop=True)
c1 c2
0 4 6
1 1 4,5
2 2 7,3,2
Upvotes: 5