Reputation: 107
I have a Pandas dataframe that I'm working with and I simply need to divide all values in a certain column that are greater than 800 by 100. In other words, if the value in the 'credit_score' column is greater than 800, it can be assumed that the data were entered with two extra places to the left of the decimal place. For example...
id credit_score column_b column_c
0 750 ... ...
1 653 ... ...
2 741 ... ...
3 65100 ... ...
4 73500 ... ...
5 565 ... ...
6 480 ... ...
7 78900 ... ...
8 699 ... ...
9 71500 ... ...
So I basically want to divide the credit scores for row indexes 3, 4, 7, and 9 by 100, but not the others. I want the new, valid values to replace the old, invalid ones. Alternatively, a new column such as 'credit_score_fixed' would work too. I'm fairly new to Python and Pandas, so any help is much appreciated.
Upvotes: 6
Views: 33121
Reputation: 863166
You can use mask
:
df.credit_score = df.credit_score.mask( df.credit_score > 800, df.credit_score/ 100)
Or numpy.where
:
df.credit_score = np.where( df.credit_score > 800, df.credit_score/ 100, df.credit_score)
print (df)
id credit_score col col1
0 0 750 750.0 750.0
1 1 653 653.0 653.0
2 2 741 741.0 741.0
3 3 65100 651.0 651.0
4 4 73500 735.0 735.0
5 5 565 565.0 565.0
6 6 480 480.0 480.0
7 7 78900 789.0 789.0
8 8 699 699.0 699.0
9 9 71500 715.0 715.0
Upvotes: 8
Reputation: 210882
I'd use Pandas boolean indexing:
In [193]: df.loc[df.credit_score > 800, 'credit_score'] /= 100
In [194]: df
Out[194]:
credit_score
id
0 750.0
1 653.0
2 741.0
3 651.0
4 735.0
5 565.0
6 480.0
7 789.0
8 699.0
9 715.0
Upvotes: 3
Reputation: 81654
You can use Series.apply
. It accepts a function and applies it on every element in the series. Note that it is not inplace and you will in need to reassign the series that it returns, either to a new column or to the same column.
def fix_scores(score):
return score / 100 if score > 800 else score
# same as
# if score > 800:
# return score / 100
# return score
df['credit_score_fixed'] = df['credit_score'].apply(fix_scores)
Upvotes: 4