Reputation: 3432
Hello I have a df such as :
Groups COL1
G1 Seq:1
G1 Seq:2
G1 Seq_1
G1 Seq:4
G2 Seq_2
G2 Seq_3
G2 Seq_4
G3 Seq:5
G3 Seq:6
G4 Seq:7
G4 Seq_5
and I would like to count :
does someone have na idea ? I guess I should sue a re.sub
and do the sum of each Groups
in pandas ?
Upvotes: 0
Views: 56
Reputation: 20659
You can use this to count using pd.Series.str.contains
then use GroupBy.all
and GroupBy.any
om = df['COL1'].str.contains(':')
one = om.groupby(df['Groups']).all().sum() # 1
two = om.groupby(df['Groups']).any().sum() - one # 2
# minus one because `any` counts all Trues too so we need
# subtract groups with all Trues.
three = (~om).groupby(df['Groups']).all().sum() # 1
Upvotes: 2
Reputation: 863246
Use Series.str.contains
for mask and then compare by numpy.setdiff1d
filtered values by DataFrame.loc
with inverted mask by ~
or mask:
m = df['COL1'].str.contains(':')
a = np.setdiff1d(df['Groups'], df.loc[~m, 'Groups']).tolist()
print (a)
['G3']
c = np.setdiff1d(df['Groups'], df.loc[m, 'Groups']).tolist()
print (c)
['G2']
b = np.setdiff1d(df.loc[~m, 'Groups'], c).tolist()
print (b)
['G1', 'G4']
Anf for count get length of lists:
print (len(a))
print (len(b))
print (len(c))
Upvotes: 2