user12221521
user12221521

Reputation: 1

How to rename multiple columns with different column names and different order in several dataframes based on a dictionary in R

I am working on merging multiple datasets from different sources. The column names from each dataset (as datframes) have different names and are in different orders. I have created a dictionary that contains all the different names and the common name I want to rename the original names with. How do I rename the original column names using the dictionary in R? I specifically want to use a dictionary because I may add more datasets (with different column names) in the future and it would be easy to adapt the dictionary.

I know I can manually rename every column but there are many (like 30) and they may change with the addition of new datasets.

df1 <- data.frame(site = c(1:6), code = c(rep("A",3), rep("B", 3)), result = c(20:25))
df2 <- data.frame(site_no = c(10:19), day = c(1:10), test = c(rep("A", 5), rep("B", 5)), value = c(1:10))
dict <- data.frame(oldName = c("site", "code", "result", "site_no", "day", "test", "value"),  newName = c("site_number", "parameter", "result", "site_number", "day", "parameter", "result"))

I would like to rename the columns in df1 and df2 based on the dict dataframe, which contains the old names (all the column names from df1 and df2) and the new names (the common names to use).

The result would be:

colnames(df1)
"site_number" "parameter" "result"

colnames(df2)
"site_number" "day" "parameter" "result"

Upvotes: 0

Views: 999

Answers (2)

akrun
akrun

Reputation: 887068

We can use rename_all after placing the datasets in a list. It is better to have those datasets in a list instead of having them in the global environment

library(dplyr)
library(purrr)
out <- mget(ls(pattern = "^df\\d+$")) %>%
       map(~ .x %>% 
         rename_all(~  as.character(dict$newName)[match(., dict$oldName)]))

If we want, we can can change the column names in the original object with list2env

list2env(out, .GlobalEnv)
names(df1)
#[1] "site_number" "parameter"   "result"     

names(df2)
#[1] "site_number" "day"         "parameter"   "result"     

Upvotes: 1

Onyambu
Onyambu

Reputation: 79208

We can match the names of the respective df to the oldname, then extract the newname at the matched indices:

names(df1) = with(dict,newName[match(names(df1),oldName)])
names(df2) = with(dict,newName[match(names(df2),oldName)])
print(df1)
print(df2)

Upvotes: 3

Related Questions