pill45
pill45

Reputation: 621

How to rename a column under for loop in R?

In my folder there is a bunch of files, which file name is in this patter,

GSM123445_samples_table.txt
GSM129995_samples_table.txt
...
...
GSM129999_samples_table.txt

Inside each file, the table is in this pattern

Identifier     VALUE
     10001   0.12323
     10002   0.11535

To create a dataframe that include only those information I want, I am using a list to go through the folder to select the file I want and read the file table.

I want my dataframe to look like this

     Identifier  GSM123445  GSM129995  GSM129999  GSM130095
 1       10001     0.12323    0.14523    0.22387    0.56233
 2       10002     0.11535    0.39048    0.23437   -0.12323
 3       10006     0.12323    0.35634    0.12237   -0.12889
 4       10008     0.11535    0.23454    0.21227    0.90098

This is my code

library(dplyr)
for (file in file_list){
  if (!exists("dataset")){     # if dataset not exists, create one
     dataset <- read.table(file, header=TRUE, sep="\t") #read txt file from folder
     x <- unlist(strsplit(file, "_"))[1] # extract the GSMxxxxxx from the name of files
     dataset <- rename(dataset, x = VALUE) # rename the column
  }     
  else {
     temp_dataset <- read.table(file, header=TRUE, sep="\t") # read file
     x <- unlist(strsplit(file, "_"))[1]
     temp_dataset <- rename(temp_dataset, x = VALUE)    
     dataset<-left_join(dataset, temp_dataset, "Reporter.Identifier")
     rm(temp_dataset)
  }
}

However, my outcome does not work, and my dataframe look like this

     Identifier        x.x        x.y        x.x        x.y
 1       10001     0.12323    0.14523    0.22387    0.56233
 2       10002     0.11535    0.39048    0.23437   -0.12323

Obviously, the rename part had failed to work.

How can I solve this problem?

Upvotes: 0

Views: 985

Answers (1)

aichao
aichao

Reputation: 7435

The issue is that rename(dataset, x = VALUE) uses x as the column name and not the value of the variable x. One way to fix this is to not use rename and instead concatenate the collection of column names in x and then set the column names of dataset at the end using colnames:

library(dplyr)
x <- "Identifier"  ## This will hold all column names
for (file in file_list){
  if (!exists("dataset")){     # if dataset not exists, create one
     dataset <- read.table(file, header=TRUE, sep="\t") #read txt file from folder
     x <- c(x, unlist(strsplit(file, "_"))[1]) # extract the GSMxxxxxx from the name of files can append it to x
  }     
  else {
     temp_dataset <- read.table(file, header=TRUE, sep="\t") # read file
     x <- c(x, unlist(strsplit(file, "_"))[1])
     dataset<-left_join(dataset, temp_dataset, "Reporter.Identifier")
     rm(temp_dataset)
  }
}
colnames(dataset) <- x

Hope this helps.

Upvotes: 2

Related Questions