ravat51
ravat51

Reputation: 1

Pandas: count occurrences that contains words and do not contain other words

I'm trying to get a count of the number of entries that contain some words but also must not contain other words. To be clear, I want to get an idea of the number of occurrences assuming an eliminating condition is not met. Here's what I have:

 import pandas as pd
 import re

 data = pd.read_csv('rando-file')

 vague_series = pd.DataFrame([(data['text'].str.contains('bla1|bla2', 
                                      flags=re.IGNORECASE, regex = True))

            &

           (~data['text'].str.contains('blah3|bla4', 
                                flags=re.IGNORECASE, regex = True))])

 vague_count = vague_series.columns[0].sum()

 print(vague_count)

Any attempt to count or sum has failed in this instance with an invalid syntax error. removing the columns[0] bit resulted simply in a 0, 1 designation in place of true and false.

Upvotes: 0

Views: 115

Answers (1)

Chrys Bltr
Chrys Bltr

Reputation: 78

Could you post data sample for test ?

I try it with a custom sample and it working well:

import pandas as pd 
import re 

sr = pd.Series(['New_York', 'Lisbon', 'Tokyo', 'Paris', 'Munich']) 

idx = ['City 1', 'City 2', 'City 3', 'City 4', 'City 5'] 

sr.index = idx  
result = (sr.str.contains(pat='i[a-z]', regex=True)) & (~sr.str.contains('s[a-z]', regex=True))
print(result.sum())

>>>2

Maybe don't wrap it in DataFrame and try simply:

vague_series = (data['text'].str.contains('bla1|bla2', flags=re.IGNORECASE, regex=True) & 
                ~data['text'].str.contains('blah3|bla4', flags=re.IGNORECASE, regex=True))
count = vague_series.sum()

Upvotes: 1

Related Questions