joefromthebus
joefromthebus

Reputation: 25

how to group by value in column, find null, then replace

I'm new to Python, and I'm lost.

I have a dataframe named d that is like this:

    name  win   loser   sortvalue  
    joe   yes           car1        
    jan         yes     car1 
    stan        yes     car1
    fred        yes     car2
    josh  yes           car2

And I want it to look like this:

    win   loser   sortvalue
    joe   jan     car1
    joe   stan    car1
    josh  fred    car2

So I want to sort it by "sortvalue" and then find the cells in the "win" column that say "yes" and replace them with the corresponding "name" in the "loser" column/rows that say "yes."

What I have done so far is sort the dataframe using groupby:

    d = pd.read_excel('nameoffile.xls', sheet_name='jw')
    df = pd.DataFrame(d)
    df1 = dict(tuple(df.groupby('sortvalue')))

This gives me back a series of dictionaries that are sorted by "sortvalue." But I can't figure out how to write the function that iterates over these dictionaries to replace the "yes" in the win with the "name" in the columns that say "yes" in "loser." And I'm also unsure if groupby is the right thing to do. Is it right to make a dictionary?

Upvotes: 2

Views: 117

Answers (3)

jezrael
jezrael

Reputation: 863166

Filter DataFrame by yes values, rename columns and use DataFrame.merge:

df1 = df.loc[df['win'] == 'yes', ['name','sortvalue']].rename(columns={'name':'win'})
df2 = df.loc[df['loser'] == 'yes', ['name','sortvalue']].rename(columns={'name':'loser'})

df = pd.merge(df1, df2, on='sortvalue')[['win','loser','sortvalue']]
print (df)
    win loser sortvalue
0   joe   jan      car1
1   joe  stan      car1
2  josh  fred      car2

Upvotes: 3

Parijat Bhatt
Parijat Bhatt

Reputation: 674

Maybe a simpler version using logic similar to first ans.

d = [['joe','yes',np.nan,'cart1'],['jan',np.nan,'yes','cart1'],['stan',np.nan,'yes','cart1'],['fred',np.nan,'yes','cart2'],['stan','yes',np.nan,'cart2']]
d = pd.DataFrame(data=d,columns=['name','win','lose','value'])

d1 = d[d['win']=='yes'].loc[:,['name','value']]
d2 = d[d['lose']=='yes'].loc[:,['name','value']]
d1['winner']=d1['name']
d2['loser']=d2['name']
ans = pd.merge(d1,d2,on='value').loc[:,['winner','loser','value']]

Upvotes: 0

Joost Döbken
Joost Döbken

Reputation: 4025

Build a new dataframe while grouping over sortvalue:

results = []
for car, row in df.groupby('sortvalue'):
    winner = row[row['win'] == 'yes'].iloc[0]
    for _, item in row[row['loser'] == 'yes'].iterrows():
        loser = item
        results.append({
            'win': winner['name'],
            'loser': loser['name'],
            'sortvalue': car
        })
df = pd.DataFrame(results)

Upvotes: 0

Related Questions