Reputation: 2250
I have a data frame df
that looks like the following:
structure(list(sequence = c("CSPPPPSPSPHPRPP", "GEGSPTSPTSPKQPG",
"EAGAPAGSGAPPPAD", "PAPPKPKESKEPENA", "AKPKQQDEDPDGAAE", "GDRGGGTGNEDDDYE"
), group = c("BP", "BP", "BP", "BP", "BP", "BP")), .Names = c("sequence",
"group"), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
For all the character variables underdf$sequence
I want to get all the possible sub-sets of 7 characters, shifting one character to the right after each iteration.
For that, I created a function called scan_core_OLpeptides
. If I apply the following function:
scan_core_OLpeptides <- function(x) {
for(i in seq_len(nchar(x)-7+1)){
print(str_sub(string = x, start = i, end = i+6))
}}
I get the following output:
[1] "CSPPPPS" "GEGSPTS" "EAGAPAG" "PAPPKPK" "AKPKQQD" "GDRGGGT"
[1] "SPPPPSP" "EGSPTSP" "AGAPAGS" "APPKPKE" "KPKQQDE" "DRGGGTG"
[1] "PPPPSPS" "GSPTSPT" "GAPAGSG" "PPKPKES" "PKQQDED" "RGGGTGN"
[1] "PPPSPSP" "SPTSPTS" "APAGSGA" "PKPKESK" "KQQDEDP" "GGGTGNE"
[1] "PPSPSPH" "PTSPTSP" "PAGSGAP" "KPKESKE" "QQDEDPD" "GGTGNED"
[1] "PSPSPHP" "TSPTSPK" "AGSGAPP" "PKESKEP" "QDEDPDG" "GTGNEDD"
[1] "SPSPHPR" "SPTSPKQ" "GSGAPPP" "KESKEPE" "DEDPDGA" "TGNEDDD"
[1] "PSPHPRP" "PTSPKQP" "SGAPPPA" "ESKEPEN" "EDPDGAA" "GNEDDDY"
[1] "SPHPRPP" "TSPKQPG" "GAPPPAD" "SKEPENA" "DPDGAAE" "NEDDDYE"
Which is exactly what I want. However, I wanted to store this output into an object, preferably into a vector or into a data.frame. I thought about storing into a list but it did not work.
Upvotes: 0
Views: 140
Reputation: 887891
We can modify the function to create a list
to store the output that comes from each 'i'
scan_core_OLpeptides <- function(x) {
x1 <- vector("list", nrow(df))
for(i in seq(nchar(x) - 7 +1) ){
x1[[i]] <- str_sub(string = x, start = i, end = i+6)
}
x1
}
scan_core_OLpeptides(df$sequence)
#[[1]]
#[1] "CSPPPPS" "GEGSPTS" "EAGAPAG" "PAPPKPK" "AKPKQQD" "GDRGGGT"
#[[2]]
#[1] "SPPPPSP" "EGSPTSP" "AGAPAGS" "APPKPKE" "KPKQQDE" "DRGGGTG"
#[[3]]
#[1] "PPPPSPS" "GSPTSPT" "GAPAGSG" "PPKPKES" "PKQQDED" "RGGGTGN"
#[[4]]
#[1] "PPPSPSP" "SPTSPTS" "APAGSGA" "PKPKESK" "KQQDEDP" "GGGTGNE"
#[[5]]
#[1] "PPSPSPH" "PTSPTSP" "PAGSGAP" "KPKESKE" "QQDEDPD" "GGTGNED"
#[[6]]
#[1] "PSPSPHP" "TSPTSPK" "AGSGAPP" "PKESKEP" "QDEDPDG" "GTGNEDD"
Upvotes: 1
Reputation: 51592
I 'd use zoo
package for this,
library(zoo)
sapply(strsplit(df1$sequence, ''), function(i) rollapply(i, 7, by = 1,
function(i)paste0(i, collapse = '')))
Upvotes: 3