pylearner
pylearner

Reputation: 1460

get rows based on a condition and separate them into subsets

am trying to subset a dataset based on a condition and pick the rows until it sees the value based on a condition

Condition, if Column A == 0, column B should start with 'a'.

Dataset:

A   B
0   aa
1   ss
2   dd
3   ff
0   ee
1   ff
2   bb
3   gg
0   ar
1   hh
2   ww
0   jj
1   ll

expected:

[0:{'A':[0,1,2,3], 'B':['aa','ss','dd','ff']}, 1:{'A':[0,1,2], 'B':['ar','hh,'ww']} ]

The series starts from column A == 0 and ends until the next 0. In total there are 4 different dictionaries in that dataframe.

Upvotes: 0

Views: 63

Answers (3)

Michael Szczesny
Michael Szczesny

Reputation: 5026

This answers this question's revision 2020-11-04 19:29:39Z. Later additions/edits to the question or additional requirements in the comments will not be considered.

First find the desired rows and select them into a new dataframe. Group the rows and convert them to dicts.

g = (df.A.eq(0).astype(int) + df.B.str.startswith('a')).replace(0, method='ffill') - 1
df_BeqA = df[g.astype('bool')]

{x: y.to_dict('list') for x , y in df_BeqA.groupby(df_BeqA.A.eq(0).cumsum() - 1)}

Out:

{0: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
 1: {'A': [0, 1, 2], 'B': ['ar', 'hh', 'ww']}}

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150785

Do a cumsum on the condition to identify the groups, then groupby:

groups = (df['A'].eq(0) & df['B'].str.startswith('a')).cumsum()

{k:v.to_dict(orient='list') for k,v in df.groupby(groups)}

Output:

{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
 2: {'A': [0, 1, 2, 3], 'B': ['ae', 'ff', 'bb', 'gg']},
 3: {'A': [0, 1, 2, 0, 1], 'B': ['ar', 'hh', 'ww', 'jj', 'll']}}

Upvotes: 2

BENY
BENY

Reputation: 323316

May be try with cumsum as well ~

{x : y.to_dict('list')for x , y in df.groupby(df['A'].eq(0).cumsum())}
Out[87]: 
{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
 2: {'A': [0, 1, 2, 3], 'B': ['ee', 'ff', 'bb', 'gg']},
 3: {'A': [0, 1, 2], 'B': ['rr', 'hh', 'ww']},
 4: {'A': [0, 1], 'B': ['jj', 'll']}}

Upvotes: 2

Related Questions