Reputation: 25
I'm new to Python, and I'm lost.
I have a dataframe named d that is like this:
name win loser sortvalue
joe yes car1
jan yes car1
stan yes car1
fred yes car2
josh yes car2
And I want it to look like this:
win loser sortvalue
joe jan car1
joe stan car1
josh fred car2
So I want to sort it by "sortvalue" and then find the cells in the "win" column that say "yes" and replace them with the corresponding "name" in the "loser" column/rows that say "yes."
What I have done so far is sort the dataframe using groupby:
d = pd.read_excel('nameoffile.xls', sheet_name='jw')
df = pd.DataFrame(d)
df1 = dict(tuple(df.groupby('sortvalue')))
This gives me back a series of dictionaries that are sorted by "sortvalue." But I can't figure out how to write the function that iterates over these dictionaries to replace the "yes" in the win with the "name" in the columns that say "yes" in "loser." And I'm also unsure if groupby is the right thing to do. Is it right to make a dictionary?
Upvotes: 2
Views: 117
Reputation: 863166
Filter DataFrame by yes
values, rename
columns and use DataFrame.merge
:
df1 = df.loc[df['win'] == 'yes', ['name','sortvalue']].rename(columns={'name':'win'})
df2 = df.loc[df['loser'] == 'yes', ['name','sortvalue']].rename(columns={'name':'loser'})
df = pd.merge(df1, df2, on='sortvalue')[['win','loser','sortvalue']]
print (df)
win loser sortvalue
0 joe jan car1
1 joe stan car1
2 josh fred car2
Upvotes: 3
Reputation: 674
Maybe a simpler version using logic similar to first ans.
d = [['joe','yes',np.nan,'cart1'],['jan',np.nan,'yes','cart1'],['stan',np.nan,'yes','cart1'],['fred',np.nan,'yes','cart2'],['stan','yes',np.nan,'cart2']]
d = pd.DataFrame(data=d,columns=['name','win','lose','value'])
d1 = d[d['win']=='yes'].loc[:,['name','value']]
d2 = d[d['lose']=='yes'].loc[:,['name','value']]
d1['winner']=d1['name']
d2['loser']=d2['name']
ans = pd.merge(d1,d2,on='value').loc[:,['winner','loser','value']]
Upvotes: 0
Reputation: 4025
Build a new dataframe while grouping over sortvalue:
results = []
for car, row in df.groupby('sortvalue'):
winner = row[row['win'] == 'yes'].iloc[0]
for _, item in row[row['loser'] == 'yes'].iterrows():
loser = item
results.append({
'win': winner['name'],
'loser': loser['name'],
'sortvalue': car
})
df = pd.DataFrame(results)
Upvotes: 0