Reputation: 61
I am working with a column of a dataframe called "companies" (you can see what it looks like below). I would like to use this column to create another column called "occurrences". My goal is to populate the occurrences column sequentially based on how many times a company name occurs. I want it to look like this
company | occurrences
company 1 | 1
company 1 | 2
company 1 | 3
company 2 | 1
company 2 | 2
company 3 | 1
company 4 | 1
company 4 | 2
company 5 | 1
company 5 | 2
company 5 | 3
company 5 | 4
Unfortunately, I'm having quite a bit of trouble doing this. This is my attempt at starting this but first, it's creating an infinite while loop that I can't figure out how to break out of, and second, even if it worked, this would fill the values incorrectly. Along with that, the if statement nested within the while statement is returning the entire column instead of the final count of companies.
def occurrences(companies):
occurrences = []
for i in range(len(companies)):
x = 0
occurrences.append(x)
while str(companies[i]) == str(companies[i+1]):
x = x+1
occurrences.append(x)
if str(companies[i]) is not str(companies[i+1]):
x = companies.str.count(companies[i])
occurrences.append(x)
return round_number
occurrences(companies)
I know the line "for i in range(len(companies))" iterates down the column and I know that "str(companies[i]) == str(companies[i+1])" does compare the company names correctly. I believe everything else is completely wrong though. Any advice would be much appreciated.
Upvotes: 0
Views: 126
Reputation: 30920
You don't need use a loop for it. You can use groupby
+ cumcount
:
df['ocurrence']=df.groupby('company').cumcount()+1
print(df)
company ocurrence
0 company 1 1
1 company 1 2
2 company 1 3
3 company 2 1
4 company 2 2
5 company 3 1
6 company 4 1
7 company 4 2
8 company 5 1
9 company 5 2
10 company 5 3
11 company 5 4
Upvotes: 1
Reputation: 77827
You have a just a few errors:
for i in range(len(companies)):
This should be your only loop; it will drive your travel down the column. Everything else will simply use the i
line index.
while str(companies[i]) == str(companies[i+1]):
Use if
; you make this check only once per iteration. Making it a while
means that something within this whiel
loop has to alter the value of i
, or the values in your table -- otherwise, the condition never changes, and you have an infinite loop.
if str(companies[i]) is not str(companies[i+1]):
I don't understand why this exists. First, is not
must be true, as the two objects cannot possibly have the same referent: they're different locations in the same sequence. If you're trying to do something when the labels are different, then un-indent this and replace it with a simple else
, referring to the if
you just made from the ill-formed while
.
Upvotes: 0