Reputation: 965
I have the following pandas dataframe:
import pandas as pd
data = {'Sentences':['Sentence1', 'Sentence2', 'Sentence3', 'Sentences4', 'Sentences5', 'Sentences6','Sentences7', 'Sentences8'],'Time':[1,0,0,1,0,0,1,0]}
df = pd.DataFrame(data)
print(df)
I was wondering how to extract all the "Sentences" according to the "Time" column. I want to gather all the "sentences" from the first "1" to the last "0".
Maybe the expected output explains it better:
[[Sentences1,Sentences2,Sentences3],[Sentences4,Sentences5,Sentences6],[Sentences7,Sentences8]]
Is this somehow possible ? Sorry, I am very new to pandas.
Upvotes: 0
Views: 33
Reputation: 153460
Try this:
s = df['Time'].cumsum()
df.set_index([s, df.groupby(s).cumcount()])['Sentences'].unstack().to_numpy().tolist()
Output:
[['Sentence1', 'Sentence2', 'Sentence3'],
['Sentences4', 'Sentences5', 'Sentences6'],
['Sentences7', 'Sentences8', nan]]
cumsum
to group by Time = 1 with following Time = 0.groupby
with cumcount
to increment within each groupset_index
and unstack
to reshape dataframe.Upvotes: 1