Joost
Joost

Reputation: 101

How can you make a new df based on another df with date as condition in r?

I want to analyze columns based on time but i have no clue how to tackle this problem.

I have a dataframe with all sessions of clients and want to analyze orders of touchpoints used by a customer. I made dummies(type 1 till 4) for the types of touchpoints used and now I want to do some analyses on the order. First of all I want to see if the first chosen type has a influence on my dv. Therefore I want to make a df on client level with the new variables: First_type1, First_type2, First_type3 and First_type4.

My Sessions data looks like:

Client id       Date     Type1    Type2    Type 3    Type 4
    1           01/01      0        0        1         0
    1           02/01      0        1        0         0
    2           01/01      1        0        0         0
    2           02/01      0        0        0         1
    2           02/01      0        0        0         1
    3           01/01      0        0        0         1
    3           02/02      0        0        1         0
    4           01/01      0        1        0         0
    4           02/01      0        1        0         0
    4           03/01      1        0        0         0
    4           04/01      0        1        0         0

I want to have Client output that looks like:

Client id    First_type1    First_type2    First_type3    First_type4
    1             0              0              1         0
    2             1              0              0         0
    3             0              0              0         1
    4             0              1              0         0

I have no clue how to handle this, so hopefully someone can help me out. Thanks in advance.

Upvotes: 0

Views: 45

Answers (2)

Allan Cameron
Allan Cameron

Reputation: 173813

If there can be only one of the four new columns with a 1 for each of the users, it would be much better for subsequent analysis to structure your data to have a single column listing the first type used:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(cols = 3:6) %>% 
  filter(value == 1) %>% 
  group_by(Clientid) %>% 
  filter(as.numeric(Date) == min(as.numeric(Date))) %>% 
  select(Date = Date, first_type = name)

#> # A tibble: 4 x 3
#> # Groups:   Clientid [4]
#>   Clientid Date  first_type
#>      <int> <fct> <chr>     
#> 1        1 01/01 Type3     
#> 2        2 01/01 Type1     
#> 3        3 01/01 Type4     
#> 4        4 01/01 Type2 

Upvotes: 0

Ben
Ben

Reputation: 30474

One way to consider is using pivot_longer to lengthen data, filtering rows with value of 1, slice to select the row, and pivot_wider to widen data for desired format. This all assumes that the dates are all in order (was not sure about your Date column type).

library(tidyverse)

df %>%
  pivot_longer(cols = starts_with("Type")) %>%
  group_by(Client_id) %>%
  filter(value == 1) %>%
  slice(1) %>%
  pivot_wider(id_cols = Client_id, names_from = name, values_from = value, names_prefix = "First_", values_fill = list(value = 0))

Output

# A tibble: 4 x 5
# Groups:   Client_id [4]
  Client_id First_Type3 First_Type1 First_Type4 First_Type2
      <int>       <int>       <int>       <int>       <int>
1         1           1           0           0           0
2         2           0           1           0           0
3         3           0           0           1           0
4         4           0           0           0           1

Upvotes: 1

Related Questions