itjcms18
itjcms18

Reputation: 4353

Applying function to pandas DataFrame column gives numpy error

This seemed at first to be a basic process but I keep getting the following error:

TypeError: 'numpy.float64' object is not iterable

I have a pandas DataFrame with a person and his performance. I want to find the average of his top two performance scores. I wrote the following function

def second(num):
    bk = max(num)
    count = 0
    m1 = m2 = float('-inf')
    for x in num:
         count += 1
         if x >= m1:
             m1, m2 = x, m1
        elif x > m2:
            m2 = x
    return np.mean([m2, bk]) if count >= 2 else None

The DataFrame looks like this:

            Person  Rat
8612    Jeff Smith  2.4
9178    Jeff Smith  7.2
9767    Jeff Smith  9.9
10359   Jeff Smith  9.6
10963   Jeff Smith  6.6
11515   Jeff Smith  4.9
12095   Jeff Smith  3.2
12697   Jeff Smith  1.1

I did the following and received an error:

df['avg'] = df.Rat.apply(lambda x: second(x))

Upvotes: 0

Views: 382

Answers (3)

Alex Riley
Alex Riley

Reputation: 177048

One approach is to sort df first, and then use groupby and aggregate with head and mean:

>>> df.sort_values('Rat', ascending=False).groupby('Person').agg(lambda x: x.head(2).mean())
             Rat
Person          
Jeff Smith  9.75

This will give you the mean of each person's two highest ratings.

Upvotes: 2

8one6
8one6

Reputation: 13788

As written, you are applying your function to a Series not to a DataFrame. When you run Series.apply, your function gets applied iteratively to each element of the series, rather than to the series as a whole. (This is why you're getting the iteration error.)

When you call DataFrame.apply, things are different: in that context your function gets applied iteratively to each column (or row) in your dataframe.

Try: df['avg'] = df[['Rat']].apply(second). Notice that I put df[['Rat']] not df['Rat']. The extra set of brackets forces the slice to return a single column dataframe rather than a series.

Does that work?

Upvotes: 0

svenkatesh
svenkatesh

Reputation: 1192

You could try this:

 In [5]: df = pd.read_clipboard()

 In [6]: df
 Out[6]:
           Person  Rat
8612  Jeff  Smith  2.4
9178  Jeff  Smith  7.2
9767  Jeff  Smith  9.9
10359 Jeff  Smith  9.6
10963 Jeff  Smith  6.6
11515 Jeff  Smith  4.9
12095 Jeff  Smith  3.2
12697 Jeff  Smith  1.1

Sort the dataframe on Rat

In [18]: df = df.sort("Rat", ascending=0)

In [19]: df
Out[19]:
           Person  Rat
9767  Jeff  Smith  9.9
10359 Jeff  Smith  9.6
9178  Jeff  Smith  7.2
10963 Jeff  Smith  6.6
11515 Jeff  Smith  4.9
12095 Jeff  Smith  3.2
8612  Jeff  Smith  2.4
12697 Jeff  Smith  1.1

Get the average of the top two values of Rat.

In [21]: avg = df.head(2).loc[:, "Rat"].mean()

In [24]: avg
Out[24]: 9.75

Upvotes: 0

Related Questions