Reputation: 181
I have an excel file with the column A (names) and column B (description) in which i have a long description of the profile of the person. It looks like:
Name Description
James R A good systems developer...
I'm trying to count how many times for example the word 'good' appears in each row of the column 'description' and create a new column with the number of repetitions. I have a lot of values so I prefer to use pandas than excel formulas. The output should look like this:
Name Description Good
James R A good systems developer... 1
The python code that I develop is this:
In [1]: import collections
In [2]: import pandas as pd
In [3]: df=pd.read_excel('israel2013.xls')
In [4]: str1=df.description
In [5]: str2= 'good'
In [6]: for index, row in df.iterrows():
...: if str2 in str1:
...: counter=collections.Counter (r[0] for str2 in str1)
...: else:
...: print (0)
But I get all zeros from this, and I don't know whats wrong. Thank you
Upvotes: 2
Views: 83
Reputation: 18647
Try:
df['Good'] = df['description'].str.findall('good').str.len()
Upvotes: 0
Reputation: 78750
Demo dataframe:
>>> data = [['James R', 'A good systems developer'], ['Bob C', 'a guy called Bob'], ['Alice R', 'Good teacher and a good runner']]
>>> df = pd.DataFrame(data, columns=['Name', 'Description'])
>>>
>>> df
Name Description
0 James R A good systems developer
1 Bob C a guy called Bob
2 Alice R Good teacher and a good runner
Solution:
>>> df['Good'] = df.Description.str.count(r'(?i)\bgood\b')
>>> df
Name Description Good
0 James R A good systems developer 1
1 Bob C a guy called Bob 0
2 Alice R Good teacher and a good runner 2
\b
marks word boundaries, (?i)
performs a case-insensitive search. Alternatively to using (?i)
, you could import re
and supply flags=re.IGNORECASE
as the second argument to count
.
Upvotes: 2