ARJ
ARJ

Reputation: 2080

Selecting columns from dataframe based on the name of other dataframe

I have 3 dataframes, df

df = pd.DataFrame({'Name': ['CTA15', 'CTA16', 'AC007', 'AC007', 'AC007'], 
'AA_ID': [22, 22, 2, 2, 2], 
'BB_ID':[4, 5, 6, 8, 9], 
'CC_ID' : [2, 2, 3, 3, 3],
  'DD_RE': [4,7,8,9,0],
    'EE_RE':[5,8,9,9,10]})

and df_ID,

df_ID = pd.DataFrame({'Name': ['CTA15', 'CTA16', 'CFV', 'SAP', 'SOS']})

and the other one isdf_RE, both of these data frames has the column Name, so I need to merge it to data frame df, then I need to select the columns based on the last part of the data frame's name. That is, for example, if the data frame is df_ID then I need all columns ending with "ID" + "Name" for all matching rows from Name from data frame df, and if the data frame id df_REL then I need I all columns ends with "RE" + "Name" from df and I wanted to save it separately. I know I could call inside the loop as,

for dfs in dataframes:

    ID=[col for col in df.columns if '_ID' in col]
    df_ID=pd.merge(df,df_ID,on='Name')
    df_ID=df_ID[ID]

But here the ID , has to change again when the data frames ends with RE and so on , I have a couple of file with different strings so any better solution would be great

So at the end I need for df_ID as having all the columns ending with ID

 df_ID = pd.DataFrame({'Name': ['CTA15', 'CTA16'],
                        'AA_ID': [22, 22'],
                         'BB_ID':[4, 5],
                           'CC_ID' : [2, 2]})

Any help would be great

Upvotes: 1

Views: 1277

Answers (2)

Scratch'N'Purr
Scratch'N'Purr

Reputation: 10399

Assuming your columns in df are Name and anything with a suffix such as the examples you have listed (e.g. _ID, _RE), then what you could do is parse through the column names to first extract all unique possible suffixes:

# since the suffixes follow a pattern of `_*`, then I can look for the `_` character
suffixes = list(set([col[-3:] for col in df.columns if '_' in col]))

Now, with the list of suffixes, you next want to create a dictionary of your existing dataframes, where the keys in the dictionary are suffixes, and the values are the dataframes with the suffix names (e.g. df_ID, df_RE):

dfs = {}
dfs['_ID'] = df_ID
dfs['_RE'] = df_RE
...  # and so forth

Now you can loop through your suffixes list to extract the appropriate columns with each suffix in the list and do the merges and column extractions:

for suffix in suffixes:
    cols = [col for col in df.columns if suffix in col]
    dfs[suffix] = pd.merge(df, dfs[suffix], on='Name')
    dfs[suffix] = dfs[suffix][cols]

Now you have your dictionary of suffixed dataframes. If you want your dataframes as separate variables instead of keeping them in your dictionary, you can now set them back as individual objects:

df_ID = dfs['_ID']
df_RE = dfs['_RE']
... # and so forth

Putting it all together in an example

import pandas as pd
df = pd.DataFrame({'Name': ['CTA15', 'CTA16', 'AC007', 'AC007', 'AC007'],
                   'AA_ID': [22, 22, 2, 2, 2],
                   'BB_ID': [4, 5, 6, 8, 9],
                   'CC_ID': [2, 2, 3, 3, 3],
                   'DD_RE': [4, 7, 8, 9, 0],
                   'EE_RE': [5, 8, 9, 9, 10]})

# Get unique suffixes
suffixes = list(set([col[-3:] for col in df.columns if '_' in col]))

dfs = {}  # dataframes dictionary

df_ID = pd.DataFrame({'Name': ['CTA15', 'CTA16', 'CFV', 'SAP', 'SOS']})
df_RE = pd.DataFrame({'Name': ['AC007']})

dfs['_ID'] = df_ID
dfs['_RE'] = df_RE

for suffix in suffixes:
    cols = [col for col in df.columns if suffix in col]
    dfs[suffix] = pd.merge(df, dfs[suffix], on='Name')
    dfs[suffix] = dfs[suffix][cols]

df_ID = dfs['_ID']
df_RE = dfs['_RE']

print(df_ID)
print(df_RE)

Result:

   AA_ID  BB_ID  CC_ID
0     22      4      2
1     22      5      2
   DD_RE  EE_RE
0      8      9
1      9      9
2      0     10

Upvotes: 1

Allen Qin
Allen Qin

Reputation: 19947

You can first merge df with df_ID and then take the columns end with ID.

pd.merge(df,df_ID,on='Name')[[e for e in df.columns if e.endswith('ID') or e=='Name']]
Out[121]: 
   AA_ID  BB_ID  CC_ID   Name
0     22      4      2  CTA15
1     22      5      2  CTA16

Similarly, this can be done for the df_RE df as well.

pd.merge(df,df_RE,on='Name')[[e for e in df.columns if e.endswith('RE') or e=='Name']]

Upvotes: 1

Related Questions