Reputation: 353
The structure of my dataframe data1
, which has over 1.5 million rows, is like this:
data1 <- data.frame(NEW_UPC=c(11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005992,11820005992,11820005992,11820005992,11820005992,11820005992,11820005992,11820005992,11820005992,11820005993,11820005993,11820005993,11820005993,11820005993,11820005993,11820005993,11820005993,11820005993,11820005994,11820005994,11820005994,11820005994,11820005994,11820005994,11820005995,11820005995,11820005995,11820005995,11820005995,11820005995,11820005995,11820005995,11820005995),
IRI_KEY=c(1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106,1078107,1078107,1078107,1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106,1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106,1073521,1073521,1073525,1073525,1078106,1078106,1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106),
WEEK = c(1229,1230,1232,1218,1224,1229,1282,1285,1287,1229,1230,1232,1229,1230,1232,1218,1224,1229,1282,1285,1287,1229,1230,1232,1217,1221,1227,1270,1272,1273,1273,1274,1270,1272,1217,1221,1229,1230,1232,1218,1224,1229,1282,1285,1287),
END=c(1232,1232,1232,1229,1229,1229,1287,1287,1287,1232,1232,1232,1232,1232,1232,1229,1229,1229,1287,1287,1287,1232,1232,1232,1227,1227,1227,1273,1273,1273,1274,1274,1272,1272,1221,1221,1232,1232,1232,1229,1229,1229,1287,1287,1287))
I need to insert a column Exit.time
using values in columns WEEK
and END
and a cutoff value, which is 1287. Exit.time
should have 0 or 1 value based on the following logic:
if WEEK
= 1287, then Exit.time
= 0.
if Week
not equal to 1287, but WEEK
= END
then Exit.time
= 1, otherwise Exit.time
= 0.
For this I tried the following for loop and it does what is required in the above dummy data set.
i=0
for(i in 1:length(data2$NEW_UPC)){
if (data2$WEEK[i]==1287) {
data2$Exit.time[i] <- 0
} else if(data2$WEEK[i]==data2$END[i]) {
data2$Exit.time[i] <- 1
} else {
data2$Exit.time[i] <- 0
}
}
The problem is that when I use the above loop in my real data set, even after an hour I am not getting an output. I guess looping is not efficient given the size of the dataset. Is there an alternative way to do what I want? I prefer to maintain the order of rows in data1
since I need to do some merge operations later on.
Upvotes: 0
Views: 67
Reputation: 28695
Since you need Exit.time
to be 1 when (WEEK == END) & WEEK != 1287
and 0 otherwise, you can use as.numeric
on the results of (WEEK == END) & WEEK != 1287
, which changes TRUE
to 1
and FALSE
to 0
.
data1$Exit.time <- with(data1, as.numeric(WEEK != 1287 & WEEK == END))
Upvotes: 4
Reputation: 2678
Using data.table
:
setDT(data1)[, Exit.time := ifelse(WEEK == 1287, 0, ifelse(WEEK != 1287 & WEEK == END, 1, 0))]
Upvotes: 0
Reputation: 4551
There are multiple ways to code this, mostly differing in the semantics, they are fundamentally doing the same thing
Base R:
data1$Exit.time <- (data1$WEEK != 1287 & data1$WEEK == data1$END)*1
This involves typing data1
a lot, so there is a short-cut:
data1 <- within(data1, {
Exit.time <- (WEEK != 1287 & WEEK == END)*1
})
Tidyverse:
Tidyverse is a suite of packages which are great at manipulating data. We are using the package dplyr
, which is part of tidyverse
, so you can either load the whole thing, or just dplyr
:
library(tidyverse)
data1 <- data1 %>%
mutate(
Exit.time = (WEEK != 1287 & WEEK == END)*1
)
(I convert from TRUE/FALSE to 0/1 by multiplying by 1. It's less to type)
Upvotes: 3