Reputation: 212
While this seemed like something that had been asked before, I have not found any information on best practices on how to perform this function.
Overview: I have two dataframes; the first is what I would call a FULL dataframe. It is the original source, so to speak. Then I have a dataframe that includes parts of the original data set, but with some new or updated information.
Goal: I want to be able to take the new dataset and apply its results to the original dataframe; overlaying its values on the rows/columns of the original data set.
Problem: Currently my problem is that I have not yet discovered the proper format using merge
to get the result I am looking for. I either only get the rows that are the same with the two dataframes, or I just get the original set back, without the new information added to it.
geoid = pandas.DataFrame({'Address': ['4050 Blake Ct', '1234 w east st'],
'ID': ['789456', '654321'],
'State': ['NV', 'NV'],
'Zip': ['88991', '88991'],
'Ph': ['789456', '456132']})
print(geoid)
Address ID State Zip Ph
0 4050 Blake Ct 789456 NV 88991 789456
1 1234 w east st 654321 NV 88991 456132
original = pandas.DataFrame({'Address': ['', '1234 w east st', 'PO box 789'],
'ID': ['789456', '654321', '654789'],
'State': ['NV', 'CA', 'CA'],
'Zip': ['88991', '88991', '99663'],
'Ph': ['789456', '456132', '741852']})
print(original)
Address ID State Zip Ph
0 789456 NV 88991 789456
1 1234 w east st 654321 NV 88991 456132
2 PO box 789 654789 CA 99663 741852
# This produces the same as the original dataframe, with no changes
df = original.merge(geoid, how='left', on=list(original.columns))
# This keeps only the rows from the geoid dataframe
df = original.merge(geoid, how='right', on=list(original.columns))
# This duplicates the data so that it includes data from both dataframes
df = original.merge(geoid, how='outer', on=list(original.columns))
# This produces an empty dataframe
df = original.merge(geoid, how='inner', on=list(original.columns))
This is the result I'm looking for:
At index value 0 in the geoid dataframe, the value 4050 Blake Ct
exists, whereas it is blank in the original dataframe. I want that value to copy over to the original dataframe, overwriting what was once there.
**Note: the blank cell in the original dataframe is an example of what could be there, but is not limited to blank cells. Ideally I want to be able to "overlay" my geoid dataframe, over the original dataframe. The index for each dataframe will always coincide with the entries. Think of the geoid dataframe as a sorted or trimmed version of the original dataframe.
Address ID State Zip Ph
0 4050 Blake Ct 789456 NV 88991 789456
1 1234 w east st 654321 NV 88991 456132
2 PO box 789 654789 CA 99663 741852
There are numerous ways that this can be done outside of a pandas function, but I feel like there must be a way to do this that is built into pandas, but perhaps I'm wrong.
Upvotes: 3
Views: 406
Reputation: 184
You can use Dataframe.update()
original.update(geoid,join='left')
output:
Address ID State Zip Ph
0 4050 Blake Ct 789456 NV 88991 789456
1 1234 w east st 654321 NV 88991 456132
2 PO box 789 654789 CA 99663 741852
Upvotes: 3
Reputation: 1038
You may use the pandas.DataFrame.update
method.
geoid = geoid.set_index('ID')
original = original.set_index('ID')
original.update(geoid)
Upvotes: 4