Reputation: 3584
Suppose I have a data frame with a few numbers in the first column. I want to take these numbers, use them as locations in a string, and take a substring that includes 2 characters before and after that location. To clarify,
aggSN <- data.frame(V1=c(5,6,7,8),V2="blah")
gen <- "AJSDAFKSDAFJKLASDFKJKA" # <- take this string
aggSN # <- take the numbers in the first column
# V1 V2
# 5 blah
# 6 blah
# 7 blah
# 8 blah
and create a new column V3 that looks like
aggSN
# V1 V2 V3
# 5 blah SDAFK # <- took the two characters before and after the 5th character
# 6 blah DAFKS # <- took the two characters before and after the 6th character
# 7 blah AFKSD # <- took the two characters before and after the 7th character
# 10 blah SDAFJ # <- took the two characters before and after the 10th character
# 2 blah AJSD # <- here you can see that it the substring cuts off
Currently I am using a for loop, which works, but takes a lot of time on very large data frames and large strings. Are there any alternatives to this? Thank you.
fillvector <- ""
for(j in 1:nrow(aggSN)){fillvector[j] <- substr(gen,aggSN[j,V1]-2,aggSN[j,V1]+2)}
aggSN$V9 <- fillvector
Upvotes: 2
Views: 136
Reputation: 99331
You can use substring()
without writing a loop
aggSN <- data.frame(V1=c(5,6,7,8,2),V2="blah")
gen <- "AJSDAFKSDAFJKLASDFKJKA"
with(aggSN, substring(gen, V1-2, V1+2))
# [1] "SDAFK" "DAFKS" "AFKSD" "FKSDA" "AJSD"
So to add the new column,
aggSN$V3 <- with(aggSN, substring(gen, V1-2, V1+2))
aggSN
# V1 V2 V3
# 1 5 blah SDAFK
# 2 6 blah DAFKS
# 3 7 blah AFKSD
# 4 8 blah FKSDA
# 5 2 blah AJSD
If you are after something a bit faster, I would go with stringi::stri_sub
in place of substring()
.
Upvotes: 4
Reputation: 4686
aggSN$V3 <- sapply(aggSN$V1, function(x) substr(gen, x-2, x+2))
should do the trick.
> aggSN
V1 V2 V3
1 5 blah SDAFK
2 6 blah DAFKS
3 7 blah AFKSD
4 8 blah FKSDA
With your different example
> aggSN
V1 V2 V3
1 5 blah SDAFK
2 6 blah DAFKS
3 7 blah AFKSD
4 10 blah SDAFJ
5 2 blah AJSD
Upvotes: 2