Duccio Piovani
Duccio Piovani

Reputation: 1460

Change the values of column after having used groupby on another column (pandas dataframe)

I have two data frames, one with the coordinates of places

coord = pd.DataFrame()  
coord['Index'] = ['A','B','C']
coord['x'] = np.random.random(coord.shape[0])  
coord['y'] = np.random.random(coord.shape[0])


coord 

    Index   x   y
0   A   0.888025    0.376416
1   B   0.052976    0.396243
2   C   0.564862    0.30138

and one with several values measured in the places

df = pd.DataFrame()
df['Index'] = ['A','A','B','B','B','C','C','C','C']
df['Value'] = np.random.random(df.shape[0])

df
    Index   Value
    0   A   0.930298
    1   A   0.144550
    2   B   0.393952
    3   B   0.680941
    4   B   0.657807
    5   C   0.704954
    6   C   0.733328
    7   C   0.099785
    8   C   0.871678

I want to find an efficient way of assigning the coordinates to the df data frame. For the moment I have tried

df['x'] = np.zeros(df.shape[0])
df['y'] = np.zeros(df.shape[0])
for i in df.Index.unique():
    df.loc[df.Index == i, 'x'] = coord.loc[coord.Index == i,'x'].values
    df.loc[df.Index == i, 'y'] = coord.loc[coord.Index == i,'y'].values

which works and yields

Index   Value   x   y
0   A   0.220323    0.983739    0.121289
1   A   0.115075    0.983739    0.121289
2   B   0.432688    0.809586    0.639811
3   B   0.106178    0.809586    0.639811
4   B   0.259465    0.809586    0.639811
5   C   0.804018    0.827192    0.156095
6   C   0.552053    0.827192    0.156095
7   C   0.412345    0.827192    0.156095
8   C   0.235106    0.827192    0.156095

but this is quite sloppy, and highly inefficient. I tried to use the groupby operation like this

df['x'] =np.zeros(df.shape[0])
df['y'] =np.zeros(df.shape[0])
gb = df.groupby('Index')
for k in gb.groups.keys():
    gb.get_group(k)['x'] = coord.loc[coord.Index == i ,'x']
    gb.get_group(k)['y'] = coord.loc[coord.Index == i ,'y']

but I get this error here

/anaconda/lib/python2.7/site-packages/ipykernel_launcher.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I understand the problem, but I dont know how to overcome it.

Any suggestions ?

Upvotes: 0

Views: 164

Answers (1)

cs95
cs95

Reputation: 402263

merge is what you're looking for.

df

  Index     Value
0     A  0.930298
1     A  0.144550
2     B  0.393952
3     B  0.680941
4     B  0.657807
5     C  0.704954
6     C  0.733328
7     C  0.099785
8     C  0.871678

coord

  Index         x         y
0     A  0.888025  0.376416
1     B  0.052976  0.396243
2     C  0.564862  0.301380

df.merge(coord, on='Index')

  Index     Value         x         y
0     A  0.930298  0.888025  0.376416
1     A  0.144550  0.888025  0.376416
2     B  0.393952  0.052976  0.396243
3     B  0.680941  0.052976  0.396243
4     B  0.657807  0.052976  0.396243
5     C  0.704954  0.564862  0.301380
6     C  0.733328  0.564862  0.301380
7     C  0.099785  0.564862  0.301380
8     C  0.871678  0.564862  0.301380

Upvotes: 1

Related Questions