pmr
pmr

Reputation: 1006

renaming many columns of many dataframe in a dynamic way in R

I have a script which generates multiple dataframes after scraping data from internet

library("rvest")
urllist <- c("https://en.wikipedia.org/wiki/Jawaharlal_Nehru",
         "https://en.wikipedia.org/wiki/Indira_Gandhi")
for(i in 1:length(urllist))
{ mydata <- urllist[i]
  print(url)
  mydata<- url %>%
    html() %>%
    html_nodes(xpath='//*[@id="mw-content-text"]/table[1]') %>%
    html_table()
    X <- mydata[[1]]
    assign(paste("df", i, sep = '_'), X)
}

so it creates df_1,df_2 etc.

After download all this dataframe has 2 columns.1st column name is that person name, 2nd column name is NA.

How I can rename all those dataframes column names as 1st column name as "ID", 2nd column name as the person name dynamically ? My below try is failing.This is changing those string...it is not affecting my dataframes.

for(i in 1:length(urllist))
{ assign(colnames(get(paste("df", i, sep = '_')))[1],"ID")
  assign(colnames(get(paste("df", i, sep = '_')))[2],colnames(get(paste("df", i, sep = '_')))[1])
  }

My final goal is then to merge all those dataframes in a single dataframe based on column "ID". What could be the way ?

Solved it this way:

   for (i in (1:length(urllist))) 
{
  df.tmp <- get(paste("df", i, sep = '_'))
  names(df.tmp) <- c("ID",colnames(get(paste("df", i, sep = '_')))[1] ) 
  assign(paste("df",i,sep='_'), df.tmp)
}

for merging i have solved this way:

#making the list without the 1st df
alldflist = lapply(ls(pattern = "df_[2]"), get)
#merge multiple data frames by ID
#note at first taking the 1st df
mergedf<-df_1
for ( .df in alldflist ) 
   {
  mergedf <-merge(mergedf,.df,by.x="ID", by.y="ID",all=T)
}

It works. But Can anybody please suggest a better way for this dynamic dataframe name and merging into a single dataframe

Upvotes: 0

Views: 572

Answers (1)

admccurdy
admccurdy

Reputation: 724

Using a list as Roman pointed out in his comment would definitely work in this case but if you're already looping through your list why don't you just do it using your initial for loop...something like this:

colnames(X) <- c("ID", colnames(X)[1])

This is assuming you want the first column name to be the second column name which it looks like this is the case based on your second loop.

Upvotes: 1

Related Questions