chippycentra
chippycentra

Reputation: 3432

Count sum of of patterns matches in pandas

Hello I have a df such as :

Groups COL1
G1   Seq:1
G1   Seq:2
G1   Seq_1
G1   Seq:4
G2   Seq_2
G2   Seq_3
G2   Seq_4
G3   Seq:5
G3   Seq:6
G4   Seq:7
G4   Seq_5

and I would like to count :

  1. Nb Groups with only ":" = 1 (G3)
  2. Nb Groups with not only ":" = 2(G1 and G4 )
  3. Nb Groups without any ":" = 1 (G2)

does someone have na idea ? I guess I should sue a re.sub and do the sum of each Groups in pandas ?

Upvotes: 0

Views: 56

Answers (2)

Ch3steR
Ch3steR

Reputation: 20659

You can use this to count using pd.Series.str.contains then use GroupBy.all and GroupBy.any

om = df['COL1'].str.contains(':')

one = om.groupby(df['Groups']).all().sum() # 1
two = om.groupby(df['Groups']).any().sum() - one # 2 
# minus one because `any` counts all Trues too so we need 
# subtract groups with all Trues.
three = (~om).groupby(df['Groups']).all().sum() # 1

Upvotes: 2

jezrael
jezrael

Reputation: 863246

Use Series.str.contains for mask and then compare by numpy.setdiff1d filtered values by DataFrame.loc with inverted mask by ~ or mask:

m = df['COL1'].str.contains(':')

a = np.setdiff1d(df['Groups'], df.loc[~m, 'Groups']).tolist()
print (a)
['G3']

c = np.setdiff1d(df['Groups'], df.loc[m, 'Groups']).tolist()
print (c)
['G2']

b = np.setdiff1d(df.loc[~m, 'Groups'], c).tolist()
print (b)
['G1', 'G4']

Anf for count get length of lists:

print (len(a))
print (len(b))
print (len(c))

Upvotes: 2

Related Questions