stgtscc
stgtscc

Reputation: 990

pandas join DataFrame force suffix?

How can I force a suffix on a merge or join. I understand it's possible to provide one if there is a collision but in my case I'm merging df1 with df2 which doesn't cause any collision but then merging again on df2 which uses the suffixes but I would prefer for each merge to have a suffix because it gets confusing if I do different combinations as you could imagine.

Upvotes: 45

Views: 56989

Answers (4)

Andy Hayden
Andy Hayden

Reputation: 375675

You could force a suffix on the actual DataFrame:

In [11]: df_a = pd.DataFrame([[1], [2]], columns=['A'])

In [12]: df_b = pd.DataFrame([[3], [4]], columns=['B'])

In [13]: df_a.join(df_b)
Out[13]: 
   A  B
0  1  3
1  2  4

By appending to its column's names:

In [14]: df_a.columns = df_a.columns.map(lambda x: str(x) + '_a')

In [15]: df_a
Out[15]: 
   A_a
0    1
1    2

Now joins won't need the suffix correction, whether they collide or not:

In [16]: df_b.columns = df_b.columns.map(lambda x: str(x) + '_b')

In [17]: df_a.join(df_b)
Out[17]: 
   A_a  B_b
0    1    3
1    2    4

Upvotes: 34

Renier Botha
Renier Botha

Reputation: 980

As of pandas version 0.24.2 you can add a suffix to column names on a DataFrame using the add_suffix method.

This makes a one-liner merge command with force-suffix more bearable, for example:


df_merged = df1.merge(df2.add_suffix('_2'))

Upvotes: 28

T.C. Proctor
T.C. Proctor

Reputation: 6486

This is what I've been using to pandas.merge two DataFrames and force suffixing:

def merge_force_suffix(left, right, **kwargs):
    on_col = kwargs['on']
    suffix_tupple = kwargs['suffixes']

    def suffix_col(col, suffix):
        if col != on_col:
            return str(col) + suffix
        else:
            return col

    left_suffixed = left.rename(columns=lambda x: suffix_col(x, suffix_tupple[0]))
    right_suffixed = right.rename(columns=lambda x: suffix_col(x, suffix_tupple[1]))
    del kwargs['suffixes']
    return pd.merge(left_suffixed, right_suffixed, **kwargs)

Upvotes: 1

thebeancounter
thebeancounter

Reputation: 4839

Pandas merge will give the new columns a suffix when there is already a column with the same name, When i need to force the new columns with a suffix, i create an empty column with the name of the column that i want to join.

df["colName"] = "" #create empty column 
df.merge(right = "df1", suffixes = ("_a","_b"))

You can later drop the empty column.

You could do the same for more than one columns, or for every column in df.columns.values

Upvotes: 10

Related Questions