R_Student
R_Student

Reputation: 789

How to count events UP TO certains conditions that invole datetime

I have a dataset that has the IDs of certain individuals that tried to breach a private network connection, there are only two types of breaches "X" and "Y" and the dataset will let you know if the attempt was successful (1) or unsuccessful (0) I have to compute how many times a person has tried to breach the network UP TO their first successful breach of type X and/or Y.

To illustrate I have to do my analysis as follows:

I have to organize by date in ascending order then count how many times an ID tried to breach the network until this ID had a successful breach and whether it was of type X or type Y such as:

ID  DATE       SUCCESS  TYPE
007 1/01/2021     0       X
007 1/01/2021     0       X
007 1/01/2021     1       X
007 2/01/2021     0       X
007 3/01/2021     1       Y

In this table we see that agent 007 made chronological 3 attempts before his FIRST successfully type X breach of the network, and that it only took him one attempt to do a type Y breach of the network

I have been hitting my head against a wall trying to come up with a code that would let me do that calculation for the following dataset:

library(tidyverse)
dataset<- tibble::tribble(
  ~DATE,            ~ID,    ~SUCCESS,   ~TYPE,
  "19/04/2021", "52952",    "0",    "X",
  "19/04/2021", "46102",    "0",    "X",
  "20/04/2021", "40005",    "1",    "X",
  "20/04/2021", "29872",    "0",    "X",
  "21/04/2021", "52952",    "0",    "X",
  "21/04/2021", "46102",    "1",    "Y",
  "22/04/2021", "46102",    "0",    "X",
  "22/04/2021", "46102",    "1",    "Y",
  "23/04/2021", "46102",    "0",    "Y",
  "23/04/2021", "46102",    "1",    "Y",
  "24/04/2021", "46102",    "0",    "Y",
  "24/04/2021", "40005",    "0",    "Y",
  "25/04/2021", "52952",    "0",    "Y",
  "25/04/2021", "46102",    "1",    "Y",
  "26/04/2021", "40005",    "0",    "X",
  "26/04/2021", "52952",    "1",    "X",
  "27/04/2021", "46102",    "1",    "X",
  "27/04/2021", "40005",    "1",    "X",
  "28/04/2021", "52952",    "1",    "X",
  "28/04/2021", "52952",    "0",    "X",
)

What have I done?

I first started by grouping the ID and DATES in order to select the very first attempt of type X and Y that way I know when was the very first attempt giving that this is Zero, and then I have been figuring out a way to select for each ID the "nearest" successful attempt without any luck as I always end up selecting the very last successful attempt but this excessive only calls for the very first attempt... Also, I have to be careful when selecting the first event because there may be cases where the very first attempt is a success.

Upvotes: 0

Views: 49

Answers (2)

Ben
Ben

Reputation: 30474

If you group_by both ID and TYPE and have them arranged by DATE, you can get the index of the first row where SUCCESS is equal to 1.

Note - you'll have warnings and Inf for events where there is no SUCCESS of 1 for a given ID and TYPE.

library(tidyverse)

dataset %>%
  mutate(DATE = as.Date(DATE, format = "%d/%m/%Y")) %>%
  group_by(ID, TYPE) %>%
  arrange(ID, TYPE, DATE) %>%
  summarise(EVENTS = min(which(SUCCESS == 1)))

Both min and which are base R. Consider the following data subset for 52952 and type "X":

2021-04-19 52952 0       X    
2021-04-21 52952 0       X    
2021-04-26 52952 1       X    
2021-04-28 52952 1       X    
2021-04-28 52952 0       X 

The values for success are 0, 0, 1, 1, 0 when ordered by date.

If you have a vector of those values, which will tell you the index or position, where SUCCESS is equal to 1. To demonstrate, if I create a vector vec with those same values:

vec <- c(0, 0, 1, 1, 0)
vec
[1] 0 0 1 1 0

which(vec == 1)
[1] 3 4

then which tells me that indexes/positions 3 and 4 (third and fourth value) satisfy the condition of equaling 1.

Then, min is added to take the minimum value. So, for the result from which (3 and 4), the minimum value is 3. It follows then that 3 events are needed to reach the first SUCCESS value that equals 1.

Output

# A tibble: 7 x 3
# Groups:   ID [4]
  ID    TYPE  EVENTS
  <chr> <chr>  <dbl>
1 29872 X        Inf
2 40005 X          1
3 40005 Y        Inf
4 46102 X          3
5 46102 Y          1
6 52952 X          3
7 52952 Y        Inf

Upvotes: 2

tonybot
tonybot

Reputation: 655

dataset$DATE <- lubridate::dmy(dataset$DATE)

dataset %>%
  arrange(ID, TYPE, DATE) %>%
  group_by(ID, TYPE) %>%
  # creating variable to denote when success status changes
  mutate(RLEID = rleid(SUCCESS == 0)) %>%
  # keeping the row if the attempt was unsuccessful (SUCCESS == 0) and before a successful attempt (RLEID == 1)
  filter(SUCCESS == 0 & RLEID == 1) %>%
  # finding the counts
  summarise(n = n())

Some people (ID = 29872 TYPE = X, ID = 40005 TYPE = Y, ID = 52952 TYPE = Y) never had a successful attempt. Did you still want to include them?

Upvotes: 1

Related Questions