Reputation: 47
I would like to subset my data frame by selecting columns with partial characters recognition, which works when I have a single "name" to recognize. where the data frame is:
ABBA01A ABBA01B ABBA02A ABBA02B ACRU01A ACRU01B ACRU02A ACRU02B
1908 NA NA NA NA NA NA NA NA
1909 NA NA NA NA NA NA NA NA
1910 NA NA NA NA NA NA NA NA
1911 NA NA NA NA NA NA NA NA
1912 NA NA NA NA NA NA NA NA
1913 NA NA NA NA NA NA NA NA
library(stringr)
df[str_detect(names(df), "ABBA" )]
works, and returns:
ABBA01A ABBA01B ABBA02A ABBA02B
1908 NA NA NA NA
So, I would like to create a dataframe for each of my species:
Speciesnames=unique ( substring (names(df),0, 4))
Speciesnames
[1] "ABBA" "ACRU" "ARCU" "PIAB" "PIGL"
I have tried to make a loop and use [i] as species name but the str_detect funtion does not recognise it. and I would like to add additional calculations in the loop
for ( i in seq_along(Speciesnames)){
df=df[str_detect(names(df), pattern =[i])]
print(df)
#my function for the subsetted dataframe
}
thank you for your help!
Upvotes: 1
Views: 290
Reputation: 20085
An option is to use mapply
with SIMPLIFY=FALSE
to return list of data frames for each species. startsWith
function from base-R
will provide option to subset columns starting with specie name.
# First find species but taking unique first 4 characters from column names
species <- unique(gsub("([A-Z]{4}).*", "\\1",names(df)))
# Pass each species
listOfDFs <- mapply(function(x){
df[,startsWith(names(df),x)] # Return only columns starting with species
}, species, SIMPLIFY=FALSE)
listOfDFs
# $ABBA
# ABBA01A ABBA01B ABBA02A ABBA02B
# 1908 NA NA NA NA
# 1909 NA NA NA NA
# 1910 NA NA NA NA
# 1911 NA NA NA NA
# 1912 NA NA NA NA
# 1913 NA NA NA NA
#
# $ACRU
# ACRU01A ACRU01B ACRU02A ACRU02B
# 1908 NA NA NA NA
# 1909 NA NA NA NA
# 1910 NA NA NA NA
# 1911 NA NA NA NA
# 1912 NA NA NA NA
# 1913 NA NA NA NA
Data:
df <- read.table(text =
"ABBA01A ABBA01B ABBA02A ABBA02B ACRU01A ACRU01B ACRU02A ACRU02B
1908 NA NA NA NA NA NA NA NA
1909 NA NA NA NA NA NA NA NA
1910 NA NA NA NA NA NA NA NA
1911 NA NA NA NA NA NA NA NA
1912 NA NA NA NA NA NA NA NA
1913 NA NA NA NA NA NA NA NA",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 1
Reputation: 23598
Using your data you could do the following:
bring all the data.frames to the global environment out of the list
Speciesnames <- unique(substring(names(df),0, 4))
data <- vector("list", length(Speciesnames))
for(i in seq_along(Speciesnames)) {
data[[i]] <- df %>% select(starts_with(Speciesnames[i]))
}
names(data) <- Speciesnames
list2env(data, envir = globalenv())
The end result after list2env
is 2 data.frames called "ABBA" "ACRU" which you then can access. If further manipulation is needed you might leave everything in the list and do it there.
Upvotes: 1
Reputation: 109
I think that you should select all matching columns first, and then subselect your data.frame.
patterns <- c("ABB", "CDC")
res <- lapply(patterns, function(x) grep(x, colnames(df), value=TRUE))
df[, unique(unlist(res))]
res
object is a list of matched columns for each pattern
Next step is to select unique set of columns: unique(unlist(res))
and subselect data.frame.
If you are writing production
code probably it is not the best answer.
Upvotes: 0