Jarad
Jarad

Reputation: 18913

Pandas: Update Multiple Dataframe Columns Using Duplicate Rows From Another Dataframe

I have two dataframes: df1 and df2. If the Display Name from df2 is in df1's Display Name column, I want to assign df1s Type, Format, Behavior, Datatype values to df2s values.

I've tried merge every way I could think of. I think loc is my best hope but I can't seem to get the assignment syntax right. Also, I'm looking for a succinct answer - preferably a one-liner.

Something like this:

df2.loc[df2['Display Name'].isin(df1['Display Name']), /
        ['Type', 'Format', 'Behavior', 'Datatype']] = ???

My Code:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(
  {'Behavior': ['Attribute', 'Attribute', 'Attribute', 'Attribute', 'Attribute',
              'Attribute', 'Attribute', 'Metric', 'Metric', 'Metric', 'Metric',
              'Metric', 'Metric', 'Metric', 'Metric'],
   'Datatype': ['object', 'object', 'object', 'object', 'object', 'object',
              'object', 'int64', 'int64', 'int64', 'int64', 'float64',
              'float64', 'float64', 'float64'],
   'Display Name': ['Campaign', 'Campaign ID', 'Campaign ID', 'Campaign state',
                  'Campaign state', 'Currency', 'Currency', 'Impressions',
                  'Impressions', 'Clicks', 'Clicks', 'CTR', 'CTR', 'Avg. CPC',
                  'Avg. CPC'],
   'Format': ['{}', '{}', '{}', '{}', '{}', '{}', '{}', '{:,.0f}', '{:,.0f}',
            '{:,.0f}', '{:,.0f}', '{:.2f}%', '{:.2f}%', '${:,.2f}', '${:,.2f}'],
   'Type': ['String', 'String', 'String', 'String', 'String', 'String', 'String',
          'Integer', 'Integer', 'Integer', 'Integer', 'Percent', 'Percent',
          'Currency', 'Currency']},
  columns=['Display Name', 'Type', 'Format', 'Behavior', 'Datatype'])

df2 = pd.DataFrame(
  { 'Behavior': [ 'Attribute', 'Metric', 'Metric', 'Metric', 'Attribute',
                'Metric', 'Metric', 'Attribute', 'Metric', 'Metric', 'Metric'],
  'Datatype': [ 'object', 'float64', 'float64', 'float64', 'object', 'int64',
                'int64', 'object', 'float64', 'float64', 'float64'],
  'Display Name': [ 'Match type', 'Destination URL', 'Final URL',
                    'Mobile final URL', 'Labels', 'Impressions', 'Clicks',
                    'CTR', 'Avg. CPC', 'Cost', 'Avg. position'],
  'Format': [ '{}', '{:.2f}', '{:.2f}', '{:.2f}', '{}', '{:,.0f}', '{:,.0f}',
              '{}', '{:.2f}', '{:.2f}', '{:.2f}'],
  'Type': [ 'String', 'Float', 'Float', 'Float', 'String', 'Integer',
            'Integer', 'String', 'Float', 'Float', 'Float']},
  columns=['Display Name', 'Type', 'Format', 'Behavior', 'Datatype'])

df2_vals_in_df1 = df2.loc[df2['Display Name'].isin(df1['Display Name']), df2.columns[:]]
df1_vals_in_df2 = df1.loc[df1['Display Name'].isin(df2['Display Name']), df1.columns[:]]

What it looks like:

>>> df1
      Display Name      Type    Format   Behavior Datatype
0         Campaign    String        {}  Attribute   object
1      Campaign ID    String        {}  Attribute   object
2      Campaign ID    String        {}  Attribute   object
3   Campaign state    String        {}  Attribute   object
4   Campaign state    String        {}  Attribute   object
5         Currency    String        {}  Attribute   object
6         Currency    String        {}  Attribute   object
7      Impressions   Integer   {:,.0f}     Metric    int64
8      Impressions   Integer   {:,.0f}     Metric    int64
9           Clicks   Integer   {:,.0f}     Metric    int64
10          Clicks   Integer   {:,.0f}     Metric    int64
11             CTR   Percent   {:.2f}%     Metric  float64
12             CTR   Percent   {:.2f}%     Metric  float64
13        Avg. CPC  Currency  ${:,.2f}     Metric  float64
14        Avg. CPC  Currency  ${:,.2f}     Metric  float64

>>> df2
        Display Name     Type   Format   Behavior Datatype
0         Match type   String       {}  Attribute   object
1    Destination URL    Float   {:.2f}     Metric  float64
2          Final URL    Float   {:.2f}     Metric  float64
3   Mobile final URL    Float   {:.2f}     Metric  float64
4             Labels   String       {}  Attribute   object
5        Impressions  Integer  {:,.0f}     Metric    int64
6             Clicks  Integer  {:,.0f}     Metric    int64
7                CTR   String       {}  Attribute   object
8           Avg. CPC    Float   {:.2f}     Metric  float64
9               Cost    Float   {:.2f}     Metric  float64
10     Avg. position    Float   {:.2f}     Metric  float64

>>> df2_vals_in_df1
  Display Name     Type   Format   Behavior Datatype
5  Impressions  Integer  {:,.0f}     Metric    int64
6       Clicks  Integer  {:,.0f}     Metric    int64
7          CTR   String       {}  Attribute   object
8     Avg. CPC    Float   {:.2f}     Metric  float64

>>> df1_vals_in_df2
   Display Name      Type    Format Behavior Datatype
7   Impressions   Integer   {:,.0f}   Metric    int64
8   Impressions   Integer   {:,.0f}   Metric    int64
9        Clicks   Integer   {:,.0f}   Metric    int64
10       Clicks   Integer   {:,.0f}   Metric    int64
11          CTR   Percent   {:.2f}%   Metric  float64
12          CTR   Percent   {:.2f}%   Metric  float64
13     Avg. CPC  Currency  ${:,.2f}   Metric  float64
14     Avg. CPC  Currency  ${:,.2f}   Metric  float64

Note how df1_vals_in_df2 Display Name might have the same name multiple times. Their Type, Format, Behavior, Datatype values will always be the same values in both rows.

Expected Output of df2:

>>> df2
        Display Name     Type   Format   Behavior Datatype
0         Match type   String       {}  Attribute   object
1    Destination URL    Float   {:.2f}     Metric  float64
2          Final URL    Float   {:.2f}     Metric  float64
3   Mobile final URL    Float   {:.2f}     Metric  float64
4             Labels   String       {}  Attribute   object
5        Impressions  Integer  {:,.0f}     Metric    int64 <-- same
6             Clicks  Integer  {:,.0f}     Metric    int64 <-- same
7                CTR  Percent  {:.2f}%     Metric  float64 <-- changed
8           Avg. CPC Currency ${:,.2f}     Metric  float64 <-- changed
9               Cost    Float   {:.2f}     Metric  float64
10     Avg. position    Float   {:.2f}     Metric  float64

Takeaway #1: rows 5, 6 are the same because they are the same in both df1 and df2.

Takeaway #2: row 7, changed from String, {}, Attribute, object to Percent, {:.2f}%, Metric, float64 - the row values from df1 because Display Name from df2 was found in Display Name in df1.

Takeaway #3: row 8, changed for the same reasons noted in takeaway #2.

Tried:

Q1:Python Pandas: Merge or Filter DataFrame by Another. Is there a Better Way?

Doesn't address this question because I'm not trying to create a new dataframe; I'm trying to replace values in an existing dataframe from another.

Q2:Replace column values based on another dataframe python pandas - better way?

Doesn't address this question because that example contains one df with correct values whereas my situation is a df with correct and incorrect values.

Apologies that this is such a long question. I just wanted to provide enough context.

Upvotes: 2

Views: 1618

Answers (2)

hilberts_drinking_problem
hilberts_drinking_problem

Reputation: 11602

I think that combine_first will be an elegant solution, as per JohnE, provided you set Display Name as an index. This brings me to another point. I think that your task is well-defined only if 'Display Name' corresponds to exactly one set of attributes within each table. Assuming that, you can drop duplicates, set index and use .update like so:

df1 = df1.drop_duplicates()

df1 = df1.set_index('Display Name')
df2 = df2.set_index('Display Name')

df2_c = df2.copy()

df2.update(df1)
df1.update(df2_c)

del df2_c

You can reset the dimensions of df1 with an auxiliary index if you like.

Upvotes: 2

Jarad
Jarad

Reputation: 18913

Not ideal but I was able to recreate my expected output. Problem is, I wanted to avoid making df3 and want to do the replacement all within df2 so this isn't ideal.

df2 before:

        Display Name     Type   Format   Behavior Datatype
0         Match type   String       {}  Attribute   object
1    Destination URL    Float   {:.2f}     Metric  float64
2          Final URL    Float   {:.2f}     Metric  float64
3   Mobile final URL    Float   {:.2f}     Metric  float64
4             Labels   String       {}  Attribute   object
5        Impressions  Integer  {:,.0f}     Metric    int64
6             Clicks  Integer  {:,.0f}     Metric    int64
7                CTR   String       {}  Attribute   object
8           Avg. CPC    Float   {:.2f}     Metric  float64
9               Cost    Float   {:.2f}     Metric  float64
10     Avg. position    Float   {:.2f}     Metric  float64

df3 after:

df3 = df2.combine_first(df1).drop_duplicates('Display Name', keep='last')
df3 = df3.set_index(df3['Display Name'].map(dict(zip(df2['Display Name'], df2.index)))).sort_index().reset_index(drop=True)

        Display Name      Type    Format   Behavior Datatype
0         Match type    String        {}  Attribute   object
1    Destination URL     Float    {:.2f}     Metric  float64
2          Final URL     Float    {:.2f}     Metric  float64
3   Mobile final URL     Float    {:.2f}     Metric  float64
4             Labels    String        {}  Attribute   object
5        Impressions   Integer   {:,.0f}     Metric    int64
6             Clicks   Integer   {:,.0f}     Metric    int64
7                CTR   Percent   {:.2f}%     Metric  float64
8           Avg. CPC  Currency  ${:,.2f}     Metric  float64
9               Cost     Float    {:.2f}     Metric  float64
10     Avg. position     Float    {:.2f}     Metric  float64

df2 before and after comparison:

   Display Name   Type Format Behavior Datatype
0          True   True   True     True     True
1          True   True   True     True     True
2          True   True   True     True     True
3          True   True   True     True     True
4          True   True   True     True     True
5          True   True   True     True     True
6          True   True   True     True     True
7          True  False  False    False    False
8          True  False  False     True     True
9          True   True   True     True     True
10         True   True   True     True     True

Upvotes: 0

Related Questions