Jonathan Bechtel
Jonathan Bechtel

Reputation: 3617

Iterating Over Every Item in a Series in Pandas With A Custom Function

I have a dataframe in Pandas that lists its information like this:

           Player     Year    Height
1     Stephen Curry  2015-16  6-3
2   Mirza Teletovic  2015-16  6-10
3        C.J. Miles  2015-16  6-7
4  Robert Covington  2015-16  6-9

Right now data['Height'] stores its values as strings and I'd like to convert these values into inches stores as integers for further calculation.

I've tried a few approaches, including what's listed in the Pandas documentation, but to no avail.

First Attempt

def true_height(string):
    new_str = string.split('-')
    inches1 = new_str[0]
    inches2 = new_str[1]

    inches1 = int(inches1)*12
    inches2 = int(inches2)

    return inches1 + inches2

If you run

true_height(data.iloc[0, 2])

It returns 75, the correct answer.

To run it on the entire series I changed this line of code:

new_str = string.**str**.split('-') 

And then ran:

data['Height'].apply(true_height(data['Height']))

And got the following error message:

int() argument must be a string or a number, not 'list'

I then tried using a for loop, thinking that might solve the trick, and so I modified the original formula to this:

def true_height(strings):
for string in strings:
    new_str = string.split('-')
    inches1 = new_str[0]
    inches2 = new_str[1]

    inches1 = int(inches1)*12
    inches2 = int(inches2)

    return inches1 + inches2

And now I get the following error:

'int' object is not callable

When I run:

data['Height'].apply(true_height(data['Height']))

I'm a little stumped. Any help would be appreciated. Thank you.

Upvotes: 1

Views: 155

Answers (2)

Alexander
Alexander

Reputation: 109706

df['feet'], df['inches'] = zip(*df.Height.str.split('-'))

df['feet'] = df.feet.astype(int)
df['inches'] = df.inches.astype(float)
df['height_inches'] = df.feet * 12 + df.inches

>>> df
              Player     Year Height  feet  inches  height_inches
1 Stephen      Curry  2015-16    6-3     6       3             75
2 Mirza    Teletovic  2015-16   6-10     6      10             82
3 C.J.         Miles  2015-16    6-7     6       7             79
4 Robert   Covington  2015-16    6-9     6       9             81

Upvotes: 1

akuiper
akuiper

Reputation: 215117

You can use apply on the Height column after it gets splitted into lists and pass a lambda function to it for conversion:

df['Height'] = df.Height.str.split("-").apply(lambda x: int(x[0]) * 12 + int(x[1]))

df
#             Player       Year    Height
# 1    Stephen Curry    2015-16        75
# 2  Mirza Teletovic    2015-16        82
# 3       C.J. Miles    2015-16        79
# 4 Robert Covington    2015-16        81

Or use your originally defined true_height function (1st attempt) with apply:

df['Height'] = df.Height.apply(true_height)

You just don't need to pass the df.Height to function since apply receives a function as a parameter.

Upvotes: 1

Related Questions