Alicia Pliego
Alicia Pliego

Reputation: 181

Create a new excel column with the number of repetitions a value occurs in each row of a column with PANDAS

I have an excel file with the column A (names) and column B (description) in which i have a long description of the profile of the person. It looks like:

Name             Description
James R          A good systems developer...

I'm trying to count how many times for example the word 'good' appears in each row of the column 'description' and create a new column with the number of repetitions. I have a lot of values so I prefer to use pandas than excel formulas. The output should look like this:

Name             Description                   Good
James R          A good systems developer...     1

The python code that I develop is this:

In [1]: import collections

In [2]: import pandas as pd

In [3]: df=pd.read_excel('israel2013.xls')

In [4]: str1=df.description

In [5]: str2= 'good'

In [6]: for index, row in df.iterrows():
   ...:     if str2 in str1:
   ...:         counter=collections.Counter (r[0] for str2 in str1)
   ...:     else:
   ...:         print (0)

But I get all zeros from this, and I don't know whats wrong. Thank you

Upvotes: 2

Views: 83

Answers (2)

Chris Adams
Chris Adams

Reputation: 18647

Try:

df['Good'] = df['description'].str.findall('good').str.len()

Upvotes: 0

timgeb
timgeb

Reputation: 78750

Demo dataframe:

>>> data = [['James R', 'A good systems developer'], ['Bob C', 'a guy called Bob'], ['Alice R', 'Good teacher and a good runner']]                                           
>>> df = pd.DataFrame(data, columns=['Name', 'Description'])                                                                                                                 
>>>                                                                                                                                                                          
>>> df                                                                                                                                                                       
      Name                     Description
0  James R        A good systems developer
1    Bob C                a guy called Bob
2  Alice R  Good teacher and a good runner

Solution:

>>> df['Good'] = df.Description.str.count(r'(?i)\bgood\b')                                                                                                       
>>> df                                                                                                                                                                       
      Name                     Description  Good
0  James R        A good systems developer     1
1    Bob C                a guy called Bob     0
2  Alice R  Good teacher and a good runner     2

\b marks word boundaries, (?i) performs a case-insensitive search. Alternatively to using (?i), you could import re and supply flags=re.IGNORECASE as the second argument to count.

Upvotes: 2

Related Questions