Pandas combine consecutive rows based on condition

Question

My question is similar to this one but the answer seems to work not entirely!!

merge rows pandas dataframe based on condition

Given the following pandas Dataframe:

+---------+-----------------+-----------------+
| SECTION | TEXT            | NUMBER_OF_WORDS |
+---------+-----------------+-----------------+
| ONE     | lots   of text… | 55              |
+---------+-----------------+-----------------+
| ONE     | word1           | 1               |
+---------+-----------------+-----------------+
| ONE     | lots   of text… | 151             |
+---------+-----------------+-----------------+
| ONE     | word2           | 1               |
+---------+-----------------+-----------------+
| ONE     | word3           | 1               |
+---------+-----------------+-----------------+
| ONE     | word4           | 1               |
+---------+-----------------+-----------------+
| TWO     | lots   of text… | 523             |
+---------+-----------------+-----------------+
| TWO     | lots   of text… | 123             |
+---------+-----------------+-----------------+
| TWO     | word4           | 1               |
+---------+-----------------+-----------------+

If the value in the NUMBER_OF_WORDS column is 1; it has to be combined with the above row; provided they have the same SECTION value.

Thus the final result should be like this:

+---------+--------------------------------------+-----------------+
| SECTION | TEXT                                 | NUMBER_OF_WORDS |
+---------+--------------------------------------+-----------------+
| ONE     | lots   of text…, word1               | 56              |
+---------+--------------------------------------+-----------------+
| ONE     | lots   of text…, word2, word3, word4 | 154             |
+---------+--------------------------------------+-----------------+
| TWO     | lots   of text…                      | 523             |
+---------+--------------------------------------+-----------------+
| TWO     | lots   of text…, word4               | 124             |
+---------+--------------------------------------+-----------------+

Here is the code; which seems to work but not as I want it.

df.groupby(['SECTION', (df.NUMBER_OF_WORDS.shift(1) == 1)], as_index=False, sort=False).agg({'TEXT': lambda x: ', '.join(x), 'NUMBER_OF_WORDS': lambda x: sum(x)})

UPDATE

This is the answer from BEN_YO; but he seems to have a minor typo. for the sake of making this question answered for future users I am going to include his answer slightly modified.

s = df['NUMBER_OF_WORDS'].ne(1).cumsum()
out = df.groupby(s).agg({'SECTION': 'first','TEXT': lambda x: ', '.join(x),'NUMBER_OF_WORDS': lambda x: sum(x)})

BENY · Accepted Answer

Let us try groupby with cumsum

s = df['NUMBER_OF_WORDS'].ne(1).cumsum()
out = df.groupby(s).agg({'SECTION':'first','TEXT':','.join,'NUMBER_OF_WORDS':'sum'})

Pandas combine consecutive rows based on condition

Answers (1)

Related Questions