Reputation: 360
My question is similar to this one but the answer seems to work not entirely!!
merge rows pandas dataframe based on condition
Given the following pandas Dataframe:
+---------+-----------------+-----------------+
| SECTION | TEXT | NUMBER_OF_WORDS |
+---------+-----------------+-----------------+
| ONE | lots of text… | 55 |
+---------+-----------------+-----------------+
| ONE | word1 | 1 |
+---------+-----------------+-----------------+
| ONE | lots of text… | 151 |
+---------+-----------------+-----------------+
| ONE | word2 | 1 |
+---------+-----------------+-----------------+
| ONE | word3 | 1 |
+---------+-----------------+-----------------+
| ONE | word4 | 1 |
+---------+-----------------+-----------------+
| TWO | lots of text… | 523 |
+---------+-----------------+-----------------+
| TWO | lots of text… | 123 |
+---------+-----------------+-----------------+
| TWO | word4 | 1 |
+---------+-----------------+-----------------+
If the value in the NUMBER_OF_WORDS column is 1; it has to be combined with the above row; provided they have the same SECTION value.
Thus the final result should be like this:
+---------+--------------------------------------+-----------------+
| SECTION | TEXT | NUMBER_OF_WORDS |
+---------+--------------------------------------+-----------------+
| ONE | lots of text…, word1 | 56 |
+---------+--------------------------------------+-----------------+
| ONE | lots of text…, word2, word3, word4 | 154 |
+---------+--------------------------------------+-----------------+
| TWO | lots of text… | 523 |
+---------+--------------------------------------+-----------------+
| TWO | lots of text…, word4 | 124 |
+---------+--------------------------------------+-----------------+
Here is the code; which seems to work but not as I want it.
df.groupby(['SECTION', (df.NUMBER_OF_WORDS.shift(1) == 1)], as_index=False, sort=False).agg({'TEXT': lambda x: ', '.join(x), 'NUMBER_OF_WORDS': lambda x: sum(x)})
UPDATE
This is the answer from BEN_YO; but he seems to have a minor typo. for the sake of making this question answered for future users I am going to include his answer slightly modified.
s = df['NUMBER_OF_WORDS'].ne(1).cumsum()
out = df.groupby(s).agg({'SECTION': 'first','TEXT': lambda x: ', '.join(x),'NUMBER_OF_WORDS': lambda x: sum(x)})
Upvotes: 3
Views: 2421
Reputation: 323396
Let us try groupby
with cumsum
s = df['NUMBER_OF_WORDS'].ne(1).cumsum()
out = df.groupby(s).agg({'SECTION':'first','TEXT':','.join,'NUMBER_OF_WORDS':'sum'})
Upvotes: 2