Reputation: 1460
I have two data frames, one with the coordinates of places
coord = pd.DataFrame()
coord['Index'] = ['A','B','C']
coord['x'] = np.random.random(coord.shape[0])
coord['y'] = np.random.random(coord.shape[0])
coord
Index x y
0 A 0.888025 0.376416
1 B 0.052976 0.396243
2 C 0.564862 0.30138
and one with several values measured in the places
df = pd.DataFrame()
df['Index'] = ['A','A','B','B','B','C','C','C','C']
df['Value'] = np.random.random(df.shape[0])
df
Index Value
0 A 0.930298
1 A 0.144550
2 B 0.393952
3 B 0.680941
4 B 0.657807
5 C 0.704954
6 C 0.733328
7 C 0.099785
8 C 0.871678
I want to find an efficient way of assigning the coordinates to the df data frame. For the moment I have tried
df['x'] = np.zeros(df.shape[0])
df['y'] = np.zeros(df.shape[0])
for i in df.Index.unique():
df.loc[df.Index == i, 'x'] = coord.loc[coord.Index == i,'x'].values
df.loc[df.Index == i, 'y'] = coord.loc[coord.Index == i,'y'].values
which works and yields
Index Value x y
0 A 0.220323 0.983739 0.121289
1 A 0.115075 0.983739 0.121289
2 B 0.432688 0.809586 0.639811
3 B 0.106178 0.809586 0.639811
4 B 0.259465 0.809586 0.639811
5 C 0.804018 0.827192 0.156095
6 C 0.552053 0.827192 0.156095
7 C 0.412345 0.827192 0.156095
8 C 0.235106 0.827192 0.156095
but this is quite sloppy, and highly inefficient. I tried to use the groupby operation like this
df['x'] =np.zeros(df.shape[0])
df['y'] =np.zeros(df.shape[0])
gb = df.groupby('Index')
for k in gb.groups.keys():
gb.get_group(k)['x'] = coord.loc[coord.Index == i ,'x']
gb.get_group(k)['y'] = coord.loc[coord.Index == i ,'y']
but I get this error here
/anaconda/lib/python2.7/site-packages/ipykernel_launcher.py:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
I understand the problem, but I dont know how to overcome it.
Any suggestions ?
Upvotes: 0
Views: 164
Reputation: 402263
merge
is what you're looking for.
df
Index Value
0 A 0.930298
1 A 0.144550
2 B 0.393952
3 B 0.680941
4 B 0.657807
5 C 0.704954
6 C 0.733328
7 C 0.099785
8 C 0.871678
coord
Index x y
0 A 0.888025 0.376416
1 B 0.052976 0.396243
2 C 0.564862 0.301380
df.merge(coord, on='Index')
Index Value x y
0 A 0.930298 0.888025 0.376416
1 A 0.144550 0.888025 0.376416
2 B 0.393952 0.052976 0.396243
3 B 0.680941 0.052976 0.396243
4 B 0.657807 0.052976 0.396243
5 C 0.704954 0.564862 0.301380
6 C 0.733328 0.564862 0.301380
7 C 0.099785 0.564862 0.301380
8 C 0.871678 0.564862 0.301380
Upvotes: 1