Reputation: 323
I have a pandas
dataframe with some Score
. Now, I like to check for each Name
, if the Score
improved.
If the Score
for Name
did improve, I'd like to write 1
- otherwise 0
. If there is no previous Score
available for a Name
, I'd like to write NaN
So my dataframe looks like this:
import pandas as pd
import numpy as np
first = {
'Date':['2013-02-28','2013-03-29','2013-05-29','2013-06-29','2013-02-27','2013-04-30','2013-01-20'],
'Name':['Felix','Felix','Felix','Felix','Peter','Peter','Paul'],
'Score':['10','12','13','11','14','14','9']}
df1 = pd.DataFrame(first)
And the result should look like this:
second = {
'Date':['2013-02-28','2013-03-29','2013-05-29','2013-02-27','2013-04-30','2013-01-20'],
'Name':['Felix','Felix','Felix','Peter','Peter','Paul'],
'Score':['10','12','11','14','14','9'],
'Improvement':['NaN','1','0','NaN','0','NaN']}
result = pd.DataFrame(second)
I considered doing something like:
df1['Improvement'] = np.NaN
col_idx = df1.columns.get_loc('Improvement')
grouped = df1[df1['ID'].isin(['Felix', 'Peter','Paul'])].groupby(['ID'])
for name, group in grouped:
first = True
for index, row in group.iterrows(): ...
But I actually have more than 100 names within the column Name
Upvotes: 0
Views: 34
Reputation: 1167
This can probably be simplified, but you can break it out into a groupby to get a dummy column with NaN values for the first name's score that appears, then do some np.where for the logic you want
df['v'] = df.groupby(['Name'])['Score'].shift()
df['Score'] = pd.np.where(df['Score'] > df['v'], 1, 0)
df['Score'] = pd.np.where(df['v'].isna(), pd.np.nan, df['Score'])
print(df.iloc[:, :-1])
Date Name Score
0 2013-02-28 Felix NaN
1 2013-03-29 Felix 1.0
2 2013-05-29 Felix 1.0
3 2013-06-29 Felix 0.0
4 2013-02-27 Peter NaN
5 2013-04-30 Peter 0.0
6 2013-01-20 Paul NaN
Upvotes: 1