BombSite_A
BombSite_A

Reputation: 310

How can I create a new column in a dataframe based on subset of the same data

Sorry if this is a dumb question, I'm new to R. I have a data set like this:

   t a b
1  1 1 0
2  2 1 0
3  3 1 4
4  4 1 0
5  5 1 2
6  1 2 0
7  2 2 1
8  3 2 3
9  4 2 0
10 5 2 5

I want to add a new column c which is on if b is zero and no previous b grouped by a was not zero, and zero if not. Basically I want to mark the leading zeros for each a, based on the t index. The result should look like this:

   t a b c
1  1 1 0 1
2  2 1 0 1
3  3 1 4 0
4  4 1 0 0
5  5 1 2 0
6  1 2 0 1
7  2 2 1 0
8  3 2 3 0
9  4 2 0 0
10 5 2 5 0

I tried running

data.c <- ifelse(nrow(subset(data, t < data$t & a == data$a & b != 0)) == 0 & data$b == 0, 1, 0)

but that just set c to 1 if b was 0. What am I doing wrong? How would you approach this? Thanks

Reproducible example:

t <- "time a b 
1 1 1 0
2 2 1 0
3 3 1 4
4 4 1 0
5 5 1 2
6 1 2 0
7 2 2 3
8 4 2 5
9 4 2 0"

data <- read.table(text=t, header = TRUE)

data$c <- ifelse(nrow(subset(data, t < data$t & a == data$a & b != 0)) == 0 & data$b == 0, 1, 0)

Upvotes: 1

Views: 45

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50678

How about the following using dplyr and cumsum:

require(dplyr);
df %>%
    group_by(a) %>%
    arrange(a, time) %>%
    mutate(c = ifelse(b != 0 | cumsum(b) > 0, 0, 1)) %>%
    ungroup();
#    time     a     b     c
#   <int> <int> <int> <dbl>
# 1     1     1     0  1.00
# 2     2     1     0  1.00
# 3     3     1     4  0
# 4     4     1     0  0
# 5     5     1     2  0
# 6     1     2     0  1.00
# 7     2     2     1  0
# 8     3     2     3  0
# 9     4     2     0  0
#10     5     2     5  0

Sample data

df <- read.table(text =
    "time a b
1     1 1 0
2     2 1 0
3     3 1 4
4     4 1 0
5     5 1 2
6     1 2 0
7     2 2 1
8     3 2 3
9     4 2 0
10    5 2 5", header = T)

Upvotes: 1

Related Questions