Reputation: 387
I have a pandas dataframe in this format:
id Role time text
id1 A t1 text1
id1 B t2 text2
id1 B t3 text3
id1 A t4 text4
id1 B t5 text5
id1 A t6 text6
id1 A t7 text7
id2 A t8 text8
id2 B t9 text9
id2 B t10 text1
id2 B t11 text10
id2 A t12 text11
id2 A t13 text12
I would like to form a dataframe or even a text file like this:
id Role text
id1 A text1
id1 B text2, text3
id1 A text4
id1 B text5
id1 A text6, text7
id2 A text8
id2 B text9, text10, text11
id2 A text12, text13
id2 B text11
id2 A text12, text13
Or in the text format:
text1
text2, text3
text4
text5
text6, text7
==NEXT ID==
text8
text9, text10, text11
text12, text13
text11
text12, text13
I've used group by in pandas but I can't get it done since group by Role will put al texts together. Thanks.
Upvotes: 2
Views: 148
Reputation: 294278
check if 'Role'
is equal to its prior value and do a cumulative sum to simulate a third grouping.
g3 = (df.Role != df.Role.shift()).cumsum().rename('clump')
df.groupby(['id', 'Role', g3], sort=False).text.apply(', '.join).reset_index()
id Role clump text
0 id1 A 1 text1
1 id1 B 2 text2, text3
2 id1 A 3 text4
3 id1 B 4 text5
4 id1 A 5 text6, text7
5 id2 A 5 text8
6 id2 B 6 text9, text1, text10
7 id2 A 7 text11, text12
To dump this to text files
g3 = (df.Role != df.Role.shift()).cumsum().rename('clump')
d1 = df.groupby(['id', 'Role', g3], sort=False).text.apply(', '.join).reset_index()
for n, g in d1.groupby('id').text:
print(g)
print()
# g.to_csv('{}.csv'.format(n))
0 text1
1 text2, text3
2 text4
3 text5
4 text6, text7
Name: text, dtype: object
5 text8
6 text9, text1, text10
7 text11, text12
Name: text, dtype: object
Upvotes: 3