Rakesh Das
Rakesh Das

Reputation: 73

Calculate the last value of a vector with conditions

I want to calculate the last non direct attribute for a path. The input data frame is like below:

path = c("path1","path2","path3","path4","path5","path6","path7") 
c1 = c("channel1","direct_app","direct","channel45","channel33","direct_web","direct_web") 
c2 = c("channel2",NA,"channel23",NA,"channel11","channel5", "direct_app") 
c3 = c("direct_app",NA,"direct_app",NA, NA,"direct_app",NA)
c4 = c(NA,NA,"direct_app",NA,NA,NA,NA)
c5 = c(NA,NA,"direct_web",NA,NA,NA,NA)
df_input <- data.frame(path,c1,c2,c3,c4,c5)

All I want to do is add a new column and in that column I should have the value for the last non direct. NB: direct can be direct_web or direct_app

The output data frame looks like the following:

path = c("path1","path2","path3","path4","path5","path6","path7") 
c1 =c("channel1","direct_app","direct","channel45","channel33","direct_web","direct_web") 
c2 = c("channel2",NA,"channel23",NA,"channel11","channel5", "direct_app") 
c3 = c("direct_app",NA,"direct_app",NA,NA,"direct_app",NA)
c4 = c(NA,NA,"direct_app",NA,NA,NA,NA)
c5 = c(NA,NA,"direct_web",NA,NA,NA,NA)
last_non_direct <- c("channel2","direct_app","channel23","channel45","channel11","channel5","direct_app")
df_output <- data.frame(path,c1,c2,c3,c4,c5,last_non_direct)

If a path consists of only direct (i.e direct_web / direct_app) then it takes the last direct.(as shown in the output data frame) If there is no direct at all in the path, it takes the last channel.

I have implemented this using a for loop but since my data is quite big( I have 1 million paths), it's taking almost 30 mins to do the same. Any help sing dply r or similar fast method will be really appreciated.

Upvotes: 0

Views: 43

Answers (1)

Andrew Gustar
Andrew Gustar

Reputation: 18435

Using base R you could do something like this...

#find last non-NA or direct
out1 <- apply(df_input,1,function(x) tail(x[!is.na(x) & !grepl("direct",x)],1))
#find last non-NA
out2 <- apply(df_input,1,function(x) tail(x[!is.na(x)],1))
#replace those with 'path' with last non-NA
out1[grepl("path",out1)] <- out2[grepl("path",out1)]

out1 
[1] "channel2"   "direct_app" "channel23"  "channel45"  "channel11"  "channel5"   "direct_app"

Upvotes: 2

Related Questions