tawny_burrito
tawny_burrito

Reputation: 61

How to iterate down a column of names to populate a new column with the number of occurrences of each name

I am working with a column of a dataframe called "companies" (you can see what it looks like below). I would like to use this column to create another column called "occurrences". My goal is to populate the occurrences column sequentially based on how many times a company name occurs. I want it to look like this

company   |   occurrences

company 1 |   1
company 1 |   2
company 1 |   3
company 2 |   1
company 2 |   2
company 3 |   1
company 4 |   1
company 4 |   2
company 5 |   1
company 5 |   2
company 5 |   3
company 5 |   4

Unfortunately, I'm having quite a bit of trouble doing this. This is my attempt at starting this but first, it's creating an infinite while loop that I can't figure out how to break out of, and second, even if it worked, this would fill the values incorrectly. Along with that, the if statement nested within the while statement is returning the entire column instead of the final count of companies.

 def occurrences(companies):
 occurrences = []
     for i in range(len(companies)):
         x = 0
         occurrences.append(x)
         while str(companies[i]) == str(companies[i+1]):
             x = x+1
             occurrences.append(x)
             if str(companies[i]) is not str(companies[i+1]):
                 x = companies.str.count(companies[i])
                 occurrences.append(x)
     return round_number

 occurrences(companies)

I know the line "for i in range(len(companies))" iterates down the column and I know that "str(companies[i]) == str(companies[i+1])" does compare the company names correctly. I believe everything else is completely wrong though. Any advice would be much appreciated.

Upvotes: 0

Views: 126

Answers (2)

ansev
ansev

Reputation: 30920

You don't need use a loop for it. You can use groupby + cumcount:

df['ocurrence']=df.groupby('company').cumcount()+1
print(df)

      company  ocurrence
0   company 1          1
1   company 1          2
2   company 1          3
3   company 2          1
4   company 2          2
5   company 3          1
6   company 4          1
7   company 4          2
8   company 5          1
9   company 5          2
10  company 5          3
11  company 5          4

Upvotes: 1

Prune
Prune

Reputation: 77827

You have a just a few errors:

 for i in range(len(companies)):

This should be your only loop; it will drive your travel down the column. Everything else will simply use the i line index.

     while str(companies[i]) == str(companies[i+1]):

Use if; you make this check only once per iteration. Making it a while means that something within this whiel loop has to alter the value of i, or the values in your table -- otherwise, the condition never changes, and you have an infinite loop.

         if str(companies[i]) is not str(companies[i+1]):

I don't understand why this exists. First, is not must be true, as the two objects cannot possibly have the same referent: they're different locations in the same sequence. If you're trying to do something when the labels are different, then un-indent this and replace it with a simple else, referring to the if you just made from the ill-formed while.

Upvotes: 0

Related Questions