Reputation: 106
I started working with R recently and I am trying to find a solution for the following problem:
I have a data.frame
with several columns. One of them contains file names with all the information needed. Example: "13_07_26_SpeciesA_Genotype22_Column1Row2"
I want to create new columns with the information from the name. For example a genotype column with "22", a row column with "2" and so on.
I could do this with grepl
and gsub
individually as shown below:
files <- c("13_12_26_Species_Genotype22_Column1Row2",
"15_12_26_Species_Genotype01_Column2Row5")
weights <- c(20,40)
spreadsheet <- data.frame(files,weights)
GT22 <- grepl("Genotype22", spreadsheet$files)
spreadsheet$GT <- gsub("TRUE","22",GT22)
But I have to check for >1000 genotypes in many files from different dates etc. So I tried to compare a vector with all possible Genotypes e.g.
gt.list <- paste("Genotype",01:1000,sep="")
with the spreadsheet$files column using functions like match()
or apply()
. But I have not been able to get it running. The Genotypes are not in order so I want to compare every cell of the "files" column with all the entries from my vector and then write all the matches in a new column (...22,01,...). I could rewrite this function for the different information.
I would be grateful for any help!
Upvotes: 0
Views: 131
Reputation: 132999
DF <- data.frame(
do.call(rbind,strsplit(files,'_',fixed=T)),
weights,
stringsAsFactors=FALSE)
DF$GT <- substr(DF[,5],9,nchar(DF[,5]))
DF$Row <- do.call(rbind,strsplit(DF[,6],'Row',fixed=T))[,2]
# X1 X2 X3 X4 X5 X6 weights GT Row
# 1 13 12 26 Species Genotype22 Column1Row2 20 22 2
# 2 15 12 26 Species Genotype01 Column2Row5 40 01 5
I am not a regex wiz.
Upvotes: 1