alki
alki

Reputation: 3584

R creating new column without for loop

Suppose I have a data frame with a few numbers in the first column. I want to take these numbers, use them as locations in a string, and take a substring that includes 2 characters before and after that location. To clarify,

aggSN <- data.frame(V1=c(5,6,7,8),V2="blah")
gen <- "AJSDAFKSDAFJKLASDFKJKA"  # <- take this string
aggSN                            # <- take the numbers in the first column
# V1    V2
#  5  blah
#  6  blah
#  7  blah
#  8  blah

and create a new column V3 that looks like

aggSN                           
# V1    V2    V3
#  5  blah SDAFK   # <- took the two characters before and after the 5th character
#  6  blah DAFKS   # <- took the two characters before and after the 6th character 
#  7  blah AFKSD   # <- took the two characters before and after the 7th character 
# 10  blah SDAFJ   # <- took the two characters before and after the 10th character 
#  2  blah AJSD   # <- here you can see that it the substring cuts off 

Currently I am using a for loop, which works, but takes a lot of time on very large data frames and large strings. Are there any alternatives to this? Thank you.

fillvector <- ""
for(j in 1:nrow(aggSN)){fillvector[j] <- substr(gen,aggSN[j,V1]-2,aggSN[j,V1]+2)}
aggSN$V9 <- fillvector

Upvotes: 2

Views: 136

Answers (2)

Rich Scriven
Rich Scriven

Reputation: 99331

You can use substring() without writing a loop

aggSN <- data.frame(V1=c(5,6,7,8,2),V2="blah")
gen <- "AJSDAFKSDAFJKLASDFKJKA" 

with(aggSN, substring(gen, V1-2, V1+2))
# [1] "SDAFK" "DAFKS" "AFKSD" "FKSDA" "AJSD" 

So to add the new column,

aggSN$V3 <- with(aggSN, substring(gen, V1-2, V1+2))
aggSN
#   V1   V2    V3
# 1  5 blah SDAFK
# 2  6 blah DAFKS
# 3  7 blah AFKSD
# 4  8 blah FKSDA
# 5  2 blah  AJSD

If you are after something a bit faster, I would go with stringi::stri_sub in place of substring().

Upvotes: 4

Ricky
Ricky

Reputation: 4686

aggSN$V3 <- sapply(aggSN$V1, function(x) substr(gen, x-2, x+2))

should do the trick.

> aggSN
  V1   V2    V3
1  5 blah SDAFK
2  6 blah DAFKS
3  7 blah AFKSD
4  8 blah FKSDA

With your different example

> aggSN
  V1   V2    V3
1  5 blah SDAFK
2  6 blah DAFKS
3  7 blah AFKSD
4 10 blah SDAFJ
5  2 blah  AJSD

Upvotes: 2

Related Questions