Create a new excel column with the number of repetitions a value occurs in each row of a column with PANDAS

Question

I have an excel file with the column A (names) and column B (description) in which i have a long description of the profile of the person. It looks like:

Name             Description
James R          A good systems developer...

I'm trying to count how many times for example the word 'good' appears in each row of the column 'description' and create a new column with the number of repetitions. I have a lot of values so I prefer to use pandas than excel formulas. The output should look like this:

Name             Description                   Good
James R          A good systems developer...     1

The python code that I develop is this:

In [1]: import collections

In [2]: import pandas as pd

In [3]: df=pd.read_excel('israel2013.xls')

In [4]: str1=df.description

In [5]: str2= 'good'

In [6]: for index, row in df.iterrows():
   ...:     if str2 in str1:
   ...:         counter=collections.Counter (r[0] for str2 in str1)
   ...:     else:
   ...:         print (0)

But I get all zeros from this, and I don't know whats wrong. Thank you

timgeb · Accepted Answer

Demo dataframe:

>>> data = [['James R', 'A good systems developer'], ['Bob C', 'a guy called Bob'], ['Alice R', 'Good teacher and a good runner']]                                           
>>> df = pd.DataFrame(data, columns=['Name', 'Description'])                                                                                                                 
>>>                                                                                                                                                                          
>>> df                                                                                                                                                                       
      Name                     Description
0  James R        A good systems developer
1    Bob C                a guy called Bob
2  Alice R  Good teacher and a good runner

Solution:

>>> df['Good'] = df.Description.str.count(r'(?i)\bgood\b')                                                                                                       
>>> df                                                                                                                                                                       
      Name                     Description  Good
0  James R        A good systems developer     1
1    Bob C                a guy called Bob     0
2  Alice R  Good teacher and a good runner     2

\b marks word boundaries, (?i) performs a case-insensitive search. Alternatively to using (?i), you could import re and supply flags=re.IGNORECASE as the second argument to count.

Create a new excel column with the number of repetitions a value occurs in each row of a column with PANDAS

Answers (2)

Related Questions