Konn
Konn

Reputation: 106

Get information from one column to another

I started working with R recently and I am trying to find a solution for the following problem:

I have a data.frame with several columns. One of them contains file names with all the information needed. Example: "13_07_26_SpeciesA_Genotype22_Column1Row2"

I want to create new columns with the information from the name. For example a genotype column with "22", a row column with "2" and so on.

I could do this with grepl and gsub individually as shown below:

 files <- c("13_12_26_Species_Genotype22_Column1Row2", 
       "15_12_26_Species_Genotype01_Column2Row5")  
weights <- c(20,40)           
spreadsheet <- data.frame(files,weights)  
GT22 <- grepl("Genotype22", spreadsheet$files)    
spreadsheet$GT <- gsub("TRUE","22",GT22)

But I have to check for >1000 genotypes in many files from different dates etc. So I tried to compare a vector with all possible Genotypes e.g.

 gt.list <- paste("Genotype",01:1000,sep="")

with the spreadsheet$files column using functions like match() or apply(). But I have not been able to get it running. The Genotypes are not in order so I want to compare every cell of the "files" column with all the entries from my vector and then write all the matches in a new column (...22,01,...). I could rewrite this function for the different information.

I would be grateful for any help!

Upvotes: 0

Views: 131

Answers (1)

Roland
Roland

Reputation: 132999

DF <- data.frame(
  do.call(rbind,strsplit(files,'_',fixed=T)),
  weights,
  stringsAsFactors=FALSE)
DF$GT <- substr(DF[,5],9,nchar(DF[,5]))
DF$Row <- do.call(rbind,strsplit(DF[,6],'Row',fixed=T))[,2]

#   X1 X2 X3      X4         X5          X6 weights GT Row
# 1 13 12 26 Species Genotype22 Column1Row2      20 22   2
# 2 15 12 26 Species Genotype01 Column2Row5      40 01   5

I am not a regex wiz.

Upvotes: 1

Related Questions