henry
henry

Reputation: 965

Group pandas elements according to a column

I have the following pandas dataframe:

import pandas as pd
data = {'Sentences':['Sentence1', 'Sentence2', 'Sentence3', 'Sentences4', 'Sentences5', 'Sentences6','Sentences7', 'Sentences8'],'Time':[1,0,0,1,0,0,1,0]}
df = pd.DataFrame(data)
print(df)

enter image description here

I was wondering how to extract all the "Sentences" according to the "Time" column. I want to gather all the "sentences" from the first "1" to the last "0".

Maybe the expected output explains it better:

[[Sentences1,Sentences2,Sentences3],[Sentences4,Sentences5,Sentences6],[Sentences7,Sentences8]]

Is this somehow possible ? Sorry, I am very new to pandas.

Upvotes: 0

Views: 33

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Try this:

s = df['Time'].cumsum()
df.set_index([s, df.groupby(s).cumcount()])['Sentences'].unstack().to_numpy().tolist()

Output:

[['Sentence1', 'Sentence2', 'Sentence3'],
 ['Sentences4', 'Sentences5', 'Sentences6'],
 ['Sentences7', 'Sentences8', nan]]

Details:

  • Use cumsum to group by Time = 1 with following Time = 0.
  • Next, use groupby with cumcount to increment within each group
  • Lastly, use set_index and unstack to reshape dataframe.

Upvotes: 1

Related Questions