Reputation: 3617
I have a dataframe in Pandas that lists its information like this:
Player Year Height
1 Stephen Curry 2015-16 6-3
2 Mirza Teletovic 2015-16 6-10
3 C.J. Miles 2015-16 6-7
4 Robert Covington 2015-16 6-9
Right now data['Height'] stores its values as strings and I'd like to convert these values into inches stores as integers for further calculation.
I've tried a few approaches, including what's listed in the Pandas documentation, but to no avail.
First Attempt
def true_height(string):
new_str = string.split('-')
inches1 = new_str[0]
inches2 = new_str[1]
inches1 = int(inches1)*12
inches2 = int(inches2)
return inches1 + inches2
If you run
true_height(data.iloc[0, 2])
It returns 75, the correct answer.
To run it on the entire series I changed this line of code:
new_str = string.**str**.split('-')
And then ran:
data['Height'].apply(true_height(data['Height']))
And got the following error message:
int() argument must be a string or a number, not 'list'
I then tried using a for loop, thinking that might solve the trick, and so I modified the original formula to this:
def true_height(strings):
for string in strings:
new_str = string.split('-')
inches1 = new_str[0]
inches2 = new_str[1]
inches1 = int(inches1)*12
inches2 = int(inches2)
return inches1 + inches2
And now I get the following error:
'int' object is not callable
When I run:
data['Height'].apply(true_height(data['Height']))
I'm a little stumped. Any help would be appreciated. Thank you.
Upvotes: 1
Views: 155
Reputation: 109706
df['feet'], df['inches'] = zip(*df.Height.str.split('-'))
df['feet'] = df.feet.astype(int)
df['inches'] = df.inches.astype(float)
df['height_inches'] = df.feet * 12 + df.inches
>>> df
Player Year Height feet inches height_inches
1 Stephen Curry 2015-16 6-3 6 3 75
2 Mirza Teletovic 2015-16 6-10 6 10 82
3 C.J. Miles 2015-16 6-7 6 7 79
4 Robert Covington 2015-16 6-9 6 9 81
Upvotes: 1
Reputation: 215117
You can use apply on the Height
column after it gets splitted into lists and pass a lambda function to it for conversion:
df['Height'] = df.Height.str.split("-").apply(lambda x: int(x[0]) * 12 + int(x[1]))
df
# Player Year Height
# 1 Stephen Curry 2015-16 75
# 2 Mirza Teletovic 2015-16 82
# 3 C.J. Miles 2015-16 79
# 4 Robert Covington 2015-16 81
Or use your originally defined true_height
function (1st attempt) with apply
:
df['Height'] = df.Height.apply(true_height)
You just don't need to pass the df.Height
to function since apply receives a function as a parameter.
Upvotes: 1