Reputation: 4353
This seemed at first to be a basic process but I keep getting the following error:
TypeError: 'numpy.float64' object is not iterable
I have a pandas DataFrame with a person and his performance. I want to find the average of his top two performance scores. I wrote the following function
def second(num):
bk = max(num)
count = 0
m1 = m2 = float('-inf')
for x in num:
count += 1
if x >= m1:
m1, m2 = x, m1
elif x > m2:
m2 = x
return np.mean([m2, bk]) if count >= 2 else None
The DataFrame looks like this:
Person Rat
8612 Jeff Smith 2.4
9178 Jeff Smith 7.2
9767 Jeff Smith 9.9
10359 Jeff Smith 9.6
10963 Jeff Smith 6.6
11515 Jeff Smith 4.9
12095 Jeff Smith 3.2
12697 Jeff Smith 1.1
I did the following and received an error:
df['avg'] = df.Rat.apply(lambda x: second(x))
Upvotes: 0
Views: 382
Reputation: 177048
One approach is to sort df
first, and then use groupby
and aggregate with head
and mean
:
>>> df.sort_values('Rat', ascending=False).groupby('Person').agg(lambda x: x.head(2).mean())
Rat
Person
Jeff Smith 9.75
This will give you the mean of each person's two highest ratings.
Upvotes: 2
Reputation: 13788
As written, you are applying your function to a Series
not to a DataFrame
. When you run Series.apply
, your function gets applied iteratively to each element of the series, rather than to the series as a whole. (This is why you're getting the iteration error.)
When you call DataFrame.apply
, things are different: in that context your function gets applied iteratively to each column (or row) in your dataframe.
Try: df['avg'] = df[['Rat']].apply(second)
. Notice that I put df[['Rat']]
not df['Rat']
. The extra set of brackets forces the slice to return a single column dataframe rather than a series.
Does that work?
Upvotes: 0
Reputation: 1192
You could try this:
In [5]: df = pd.read_clipboard()
In [6]: df
Out[6]:
Person Rat
8612 Jeff Smith 2.4
9178 Jeff Smith 7.2
9767 Jeff Smith 9.9
10359 Jeff Smith 9.6
10963 Jeff Smith 6.6
11515 Jeff Smith 4.9
12095 Jeff Smith 3.2
12697 Jeff Smith 1.1
Sort the dataframe on Rat
In [18]: df = df.sort("Rat", ascending=0)
In [19]: df
Out[19]:
Person Rat
9767 Jeff Smith 9.9
10359 Jeff Smith 9.6
9178 Jeff Smith 7.2
10963 Jeff Smith 6.6
11515 Jeff Smith 4.9
12095 Jeff Smith 3.2
8612 Jeff Smith 2.4
12697 Jeff Smith 1.1
Get the average of the top two values of Rat
.
In [21]: avg = df.head(2).loc[:, "Rat"].mean()
In [24]: avg
Out[24]: 9.75
Upvotes: 0