Reputation: 137
I want to pull out the proteins in seq.df (a single column df) that match the indices in the matching map.list (a list of lists).
Example data:
seq.df<- rbind.data.frame("MTHISPAVYGLWAIMSVLLAAFCAY",
"MERSSAIVFPNVGTSVLSATIHLVGVTVLAHLISRRTALRGTST",
"MLFEPFWCLLDLLRWSLDTHYIPAKRPLNGGGRSSNFD")
map.list<- list(a<- list(2,3,4,5,6,7),
b<- list(13,14,30,31,32),
c<- list(5,6,10,11))
Desired output:
THISPA
GTAHL
PFLD
If I run a nested apply over just the first sublist of map.list, I get what I want for the first protein:
prot.list<- apply(seq.df, 1, function (x) lapply(map.list[[1]], function (y) substring(x, y, y)))
returns the expected result for the first sequence (THISPA,)
But I'm unsure how to get this function to iterate over all the sublists in map.list. I tried to wrap this into a for loop, but it does not give me the expected result:
for (i in seq_along(map.list)){
each.map.list<- map.list[[i]]
prot.list<- apply(seq.df, 1, function (x) lapply(each.map.list, function (y) substring(x, y, y)))
}
Output:
SPGL
SAPN
PFLD
I'd much rather add another lapply statement, but I'm not sure how to specify each list in map.list
#this does not work, but something like:
prot.list<- apply(seq.df, 1, function (x) lapply(map.list, function (y) lapply([[y]], function (z) substring(x, z, z)))
Upvotes: 2
Views: 142
Reputation: 887951
We can use Map
unlist(Map(function(x, y) paste(substring(x, unlist(y),
unlist(y)), collapse=""), seq.df[[1]], map.list))
#[1] "THISPA" "GTAHL" "PFLD"
Also, instead of unlist
ing twice, we can do a single unlist
in the beginning and use that flattened list
as input
l1 <- lapply(map.list, unlist)
sapply(Map(substring, seq.df[[1]], first = l1, last = l1), paste, collapse="")
#[1] "THISPA" "GTAHL" "PFLD"
Or with map2
from purrr
library(purrr)
map2_chr(seq.df[[1]], map.list, ~ str_c(substring(.x,
unlist(.y), unlist(.y)), collapse=""))
Upvotes: 4
Reputation: 6669
seq.df<- rbind.data.frame("MTHISPAVYGLWAIMSVLLAAFCAY",
"MERSSAIVFPNVGTSVLSATIHLVGVTVLAHLISRRTALRGTST",
"MLFEPFWCLLDLLRWSLDTHYIPAKRPLNGGGRSSNFD")
map.list<- list(a<- list(2,3,4,5,6,7),
b<- list(13,14,30,31,32),
c<- list(5,6,10,11))
lapply(1:nrow(seq.df),
function(x)paste(strsplit(as.character(seq.df[x,]), "")[[1]][unlist(map.list[[x]])], collapse=""))
[[1]]
[1] "THISPA"
[[2]]
[1] "GTAHL"
[[3]]
[1] "PFLD"
Upvotes: 1
Reputation: 27792
Here is a solution using mapply()
It uses an anonymous function, using a character-split string of seq.df as x, and the list of positions as y.
mapply( function(x,y) paste0( x[ unlist(y) ], collapse = "" ),
x = stringr::str_split( seq.df[,1], pattern = ""),
y = map.list )
[1] "THISPA" "GTAHL" "PFLD"
Upvotes: 2