Reputation: 1247
I have 3 dataframes where several columns are same. I want to make some of them unique. And remaining of them should be dropped.
df1.columns = ['payeeId', 'name','amount','Category','Rule_No','Short Rule Name','Incident','Flag_Type']
df2.columns = ['payeeId', 'name','amount','Category','Rule_No','Short Rule Name','Incident','Flag_Type']
df3.columns = ['payeeId', 'name','amount','Category','Rule_No','Short Rule Name','Incident','Flag_Type']
df4.columns = ['payeeId', 'name','amount','Category','Rule_No','Short Rule Name','Incident','Flag_Type']
Now I want Short Rule Name
, Incident
, Rule_No
and Flag_Type
to be renamed with a suffix, say, df1, df2 and df3. Remaining columns can be dropped from combined df.
If I do the following:
dfs = [df1,df2,df3]
df_f = pd.concat(dfs,axis=1)
df_f = df_f.loc[:,~df_f.columns.duplicated()]
This will remove all duplicated columns and will be left with one set of columns (for df1).
So one way to prevent the unwanted deletion of column could be renaming the selected columns as indicated above.
How can I achieve the same. I want programmatically adding suffix to those selected columns.
Upvotes: 0
Views: 226
Reputation: 863301
You can rename
columns from list in list comprehension with enumerate
for counter:
cols = ["Short Rule Name", "Incident", "Rule_No", "Flag_Type"]
dfs = [df1, df2, df3]
dfs = [df.rename(columns=dict(zip(cols, [f'{x}_{i}' for x in cols])))
for i, df in enumerate(dfs, 1)]
df_f = pd.concat(dfs,axis=1)
df_f = df_f.loc[:,~df_f.columns.duplicated()]
Sample for first part:
cols = ["Short Rule Name", "Incident", "Rule_No", "Flag_Type"]
df1 = pd.DataFrame(columns=cols)
df2 = pd.DataFrame(columns=cols + ['a', 'b'])
df3 = pd.DataFrame(columns=cols + ['a', 'c'])
df4 = pd.DataFrame(columns=cols + ['b', 'c'])
dfs = [df1, df2, df3, df4]
dfs = [df.rename(columns=dict(zip(cols, [f'{x}_{i}' for x in cols])))
for i, df in enumerate(dfs, 1)]
print (dfs)
[Empty DataFrame
Columns: [Short Rule Name_1, Incident_1, Rule_No_1, Flag_Type_1]
Index: [], Empty DataFrame
Columns: [Short Rule Name_2, Incident_2, Rule_No_2, Flag_Type_2, a, b]
Index: [], Empty DataFrame
Columns: [Short Rule Name_3, Incident_3, Rule_No_3, Flag_Type_3, a, c]
Index: [], Empty DataFrame
Columns: [Short Rule Name_4, Incident_4, Rule_No_4, Flag_Type_4, b, c]
Index: []]
Upvotes: 1
Reputation: 11
You can add a new column with the content of the old one and a new label like:
df1["Short Rule Name df1"] = df1["Short Rule Name"]
Then drop the old column using drop()
And to do it programmatically, you can pop it in a for loop
dfs = [df1, df2, df3]
i=1
for df in dfs:
df["Short Rule Name df" + str(i)] = df["Short Rule Name"]
df.drop("Short Rule Name", axis=1, inplace=True)
i+=1
Upvotes: 0