spareTimeCoder
spareTimeCoder

Reputation: 212

How to merge matching indices with two pandas dataframes

While this seemed like something that had been asked before, I have not found any information on best practices on how to perform this function.

Overview: I have two dataframes; the first is what I would call a FULL dataframe. It is the original source, so to speak. Then I have a dataframe that includes parts of the original data set, but with some new or updated information.

Goal: I want to be able to take the new dataset and apply its results to the original dataframe; overlaying its values on the rows/columns of the original data set.

Problem: Currently my problem is that I have not yet discovered the proper format using merge to get the result I am looking for. I either only get the rows that are the same with the two dataframes, or I just get the original set back, without the new information added to it.

geoid = pandas.DataFrame({'Address': ['4050 Blake Ct', '1234 w east st'], 
                            'ID': ['789456', '654321'], 
                            'State': ['NV', 'NV'], 
                            'Zip': ['88991', '88991'], 
                            'Ph': ['789456', '456132']})
print(geoid)

          Address      ID State    Zip      Ph
0   4050 Blake Ct  789456    NV  88991  789456
1  1234 w east st  654321    NV  88991  456132


original = pandas.DataFrame({'Address': ['', '1234 w east st', 'PO box 789'], 
                            'ID': ['789456', '654321', '654789'], 
                            'State': ['NV', 'CA', 'CA'], 
                            'Zip': ['88991', '88991', '99663'], 
                            'Ph': ['789456', '456132', '741852']})
print(original)

          Address      ID State    Zip      Ph
0                  789456    NV  88991  789456
1  1234 w east st  654321    NV  88991  456132
2      PO box 789  654789    CA  99663  741852


# This produces the same as the original dataframe, with no changes
df = original.merge(geoid, how='left', on=list(original.columns))

# This keeps only the rows from the geoid dataframe
df = original.merge(geoid, how='right', on=list(original.columns))

# This duplicates the data so that it includes data from both dataframes
df = original.merge(geoid, how='outer', on=list(original.columns))

# This produces an empty dataframe
df = original.merge(geoid, how='inner', on=list(original.columns))

This is the result I'm looking for: At index value 0 in the geoid dataframe, the value 4050 Blake Ct exists, whereas it is blank in the original dataframe. I want that value to copy over to the original dataframe, overwriting what was once there. **Note: the blank cell in the original dataframe is an example of what could be there, but is not limited to blank cells. Ideally I want to be able to "overlay" my geoid dataframe, over the original dataframe. The index for each dataframe will always coincide with the entries. Think of the geoid dataframe as a sorted or trimmed version of the original dataframe.

    Address      ID State    Zip      Ph
0  4050 Blake Ct   789456    NV  88991  789456
1  1234 w east st  654321    NV  88991  456132
2      PO box 789  654789    CA  99663  741852

There are numerous ways that this can be done outside of a pandas function, but I feel like there must be a way to do this that is built into pandas, but perhaps I'm wrong.

Upvotes: 3

Views: 406

Answers (2)

Rajat Agarwal
Rajat Agarwal

Reputation: 184

You can use Dataframe.update()

original.update(geoid,join='left')

output:

          Address      ID State    Zip      Ph
0   4050 Blake Ct  789456    NV  88991  789456
1  1234 w east st  654321    NV  88991  456132
2      PO box 789  654789    CA  99663  741852

Upvotes: 3

kelvt
kelvt

Reputation: 1038

You may use the pandas.DataFrame.update method.

geoid = geoid.set_index('ID')
original = original.set_index('ID')

original.update(geoid)

Upvotes: 4

Related Questions