Reputation: 73
I want to calculate the last non direct attribute for a path. The input data frame is like below:
path = c("path1","path2","path3","path4","path5","path6","path7")
c1 = c("channel1","direct_app","direct","channel45","channel33","direct_web","direct_web")
c2 = c("channel2",NA,"channel23",NA,"channel11","channel5", "direct_app")
c3 = c("direct_app",NA,"direct_app",NA, NA,"direct_app",NA)
c4 = c(NA,NA,"direct_app",NA,NA,NA,NA)
c5 = c(NA,NA,"direct_web",NA,NA,NA,NA)
df_input <- data.frame(path,c1,c2,c3,c4,c5)
All I want to do is add a new column and in that column I should have the value for the last non direct. NB: direct can be direct_web or direct_app
The output data frame looks like the following:
path = c("path1","path2","path3","path4","path5","path6","path7")
c1 =c("channel1","direct_app","direct","channel45","channel33","direct_web","direct_web")
c2 = c("channel2",NA,"channel23",NA,"channel11","channel5", "direct_app")
c3 = c("direct_app",NA,"direct_app",NA,NA,"direct_app",NA)
c4 = c(NA,NA,"direct_app",NA,NA,NA,NA)
c5 = c(NA,NA,"direct_web",NA,NA,NA,NA)
last_non_direct <- c("channel2","direct_app","channel23","channel45","channel11","channel5","direct_app")
df_output <- data.frame(path,c1,c2,c3,c4,c5,last_non_direct)
If a path consists of only direct (i.e direct_web / direct_app) then it takes the last direct.(as shown in the output data frame) If there is no direct at all in the path, it takes the last channel.
I have implemented this using a for loop but since my data is quite big( I have 1 million paths), it's taking almost 30 mins to do the same. Any help sing dply r or similar fast method will be really appreciated.
Upvotes: 0
Views: 43
Reputation: 18435
Using base R you could do something like this...
#find last non-NA or direct
out1 <- apply(df_input,1,function(x) tail(x[!is.na(x) & !grepl("direct",x)],1))
#find last non-NA
out2 <- apply(df_input,1,function(x) tail(x[!is.na(x)],1))
#replace those with 'path' with last non-NA
out1[grepl("path",out1)] <- out2[grepl("path",out1)]
out1
[1] "channel2" "direct_app" "channel23" "channel45" "channel11" "channel5" "direct_app"
Upvotes: 2