Reputation: 397
I am trying to update my dataframe by conditionally populating columns. I want to compare a value in dataframe row with that of a tuple and then populate a different column of the same dataframe row with another value from the tuple.
For example:
foo = pd.DataFrame({"TIME":([1,1,2,2,3,3,4,4,5,5,6,6]),
"PLACE": (["place1","place2","place1","place2","place1","place2","place1","place2","place1","place2","place1","place2"]),
"Xcords" :(["","","","","","","","","","","",""]),
"Ycords" :(["","","","","","","","","","","",""])})
and a tuple with the place and its x and y coordinates:
bar = [('place1','1','11'),('place3','3','33'),('place2','2','22')]
In the end I would like to have the following:
PLACE TIME Xcords Ycords
0 place1 1 1 11
1 place2 1 2 22
2 place1 2 1 11
3 place2 2 2 22
4 place1 3 1 11
5 place2 3 2 22
6 place1 4 1 11
7 place2 4 2 22
8 place1 5 1 11
9 place2 5 2 22
10 place1 6 1 11
11 place2 6 2 22
so where the dataframe "PLACE" column value matches the tuple first value in that same dataframe row the Xcords and Ycords should be populated with the tuple's 2nd and 3rd value. This should happen for all instances since they might be appearing more than one time.
What would be the correct syntax for something like this? I have tried this but then all my X_cords and Y_cords rows end up with the same value:
for i in bar:
if any(foo["PLACE"]==i[0]):
foo.X_cords = i[1]
foo.Y_cords = i[2]
Also can a for loop be avoided since both the datasets are very large?
Upvotes: 1
Views: 706
Reputation: 210932
is that what you want?
In [191]: bar_df = pd.DataFrame(bar, columns=['PLACE','Xcords','Ycords'])
In [192]: bar_df
Out[192]:
PLACE Xcords Ycords
0 place1 1 10
1 place3 3 30
2 place2 2 20
In [193]: pd.merge(foo[['PLACE','TIME']], bar_df, on='PLACE', how='left')
Out[193]:
PLACE TIME Xcords Ycords
0 place1 1 1 10
1 place2 1 2 20
2 place1 2 1 10
3 place2 2 2 20
4 place1 3 1 10
5 place2 3 2 20
6 place1 4 1 10
7 place2 4 2 20
8 place1 5 1 10
9 place2 5 2 20
10 place1 6 1 10
11 place2 6 2 20
or as @Alexander mentioned:
In [235]: foo[['PLACE','TIME']].merge(bar_df, on='PLACE', how='left')
Out[235]:
PLACE TIME Xcords Ycords
0 place1 1 1 10
1 place2 1 2 20
2 place1 2 1 10
3 place2 2 2 20
4 place1 3 1 10
5 place2 3 2 20
6 place1 4 1 10
7 place2 4 2 20
8 place1 5 1 10
9 place2 5 2 20
10 place1 6 1 10
11 place2 6 2 20
Upvotes: 4