Reputation: 1460
am trying to subset a dataset based on a condition and pick the rows until it sees the value based on a condition
Condition, if Column A == 0, column B should start with 'a'.
Dataset:
A B
0 aa
1 ss
2 dd
3 ff
0 ee
1 ff
2 bb
3 gg
0 ar
1 hh
2 ww
0 jj
1 ll
expected:
[0:{'A':[0,1,2,3], 'B':['aa','ss','dd','ff']}, 1:{'A':[0,1,2], 'B':['ar','hh,'ww']} ]
The series starts from column A == 0 and ends until the next 0. In total there are 4 different dictionaries in that dataframe.
Upvotes: 0
Views: 63
Reputation: 5026
This answers this question's revision 2020-11-04 19:29:39Z
. Later additions/edits to the question or additional requirements in the comments will not be considered.
First find the desired rows and select them into a new dataframe. Group the rows and convert them to dicts.
g = (df.A.eq(0).astype(int) + df.B.str.startswith('a')).replace(0, method='ffill') - 1
df_BeqA = df[g.astype('bool')]
{x: y.to_dict('list') for x , y in df_BeqA.groupby(df_BeqA.A.eq(0).cumsum() - 1)}
Out:
{0: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
1: {'A': [0, 1, 2], 'B': ['ar', 'hh', 'ww']}}
Upvotes: 0
Reputation: 150785
Do a cumsum on the condition to identify the groups, then groupby:
groups = (df['A'].eq(0) & df['B'].str.startswith('a')).cumsum()
{k:v.to_dict(orient='list') for k,v in df.groupby(groups)}
Output:
{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
2: {'A': [0, 1, 2, 3], 'B': ['ae', 'ff', 'bb', 'gg']},
3: {'A': [0, 1, 2, 0, 1], 'B': ['ar', 'hh', 'ww', 'jj', 'll']}}
Upvotes: 2
Reputation: 323316
May be try with cumsum
as well ~
{x : y.to_dict('list')for x , y in df.groupby(df['A'].eq(0).cumsum())}
Out[87]:
{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
2: {'A': [0, 1, 2, 3], 'B': ['ee', 'ff', 'bb', 'gg']},
3: {'A': [0, 1, 2], 'B': ['rr', 'hh', 'ww']},
4: {'A': [0, 1], 'B': ['jj', 'll']}}
Upvotes: 2