Reputation: 101
I want to analyze columns based on time but i have no clue how to tackle this problem.
I have a dataframe with all sessions of clients and want to analyze orders of touchpoints used by a customer. I made dummies(type 1 till 4) for the types of touchpoints used and now I want to do some analyses on the order. First of all I want to see if the first chosen type has a influence on my dv. Therefore I want to make a df on client level with the new variables: First_type1, First_type2, First_type3 and First_type4.
My Sessions data looks like:
Client id Date Type1 Type2 Type 3 Type 4
1 01/01 0 0 1 0
1 02/01 0 1 0 0
2 01/01 1 0 0 0
2 02/01 0 0 0 1
2 02/01 0 0 0 1
3 01/01 0 0 0 1
3 02/02 0 0 1 0
4 01/01 0 1 0 0
4 02/01 0 1 0 0
4 03/01 1 0 0 0
4 04/01 0 1 0 0
I want to have Client output that looks like:
Client id First_type1 First_type2 First_type3 First_type4
1 0 0 1 0
2 1 0 0 0
3 0 0 0 1
4 0 1 0 0
I have no clue how to handle this, so hopefully someone can help me out. Thanks in advance.
Upvotes: 0
Views: 45
Reputation: 173813
If there can be only one of the four new columns with a 1
for each of the users, it would be much better for subsequent analysis to structure your data to have a single column listing the first type used:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = 3:6) %>%
filter(value == 1) %>%
group_by(Clientid) %>%
filter(as.numeric(Date) == min(as.numeric(Date))) %>%
select(Date = Date, first_type = name)
#> # A tibble: 4 x 3
#> # Groups: Clientid [4]
#> Clientid Date first_type
#> <int> <fct> <chr>
#> 1 1 01/01 Type3
#> 2 2 01/01 Type1
#> 3 3 01/01 Type4
#> 4 4 01/01 Type2
Upvotes: 0
Reputation: 30474
One way to consider is using pivot_longer
to lengthen data, filtering rows with value
of 1, slice
to select the row, and pivot_wider
to widen data for desired format. This all assumes that the dates are all in order (was not sure about your Date
column type).
library(tidyverse)
df %>%
pivot_longer(cols = starts_with("Type")) %>%
group_by(Client_id) %>%
filter(value == 1) %>%
slice(1) %>%
pivot_wider(id_cols = Client_id, names_from = name, values_from = value, names_prefix = "First_", values_fill = list(value = 0))
Output
# A tibble: 4 x 5
# Groups: Client_id [4]
Client_id First_Type3 First_Type1 First_Type4 First_Type2
<int> <int> <int> <int> <int>
1 1 1 0 0 0
2 2 0 1 0 0
3 3 0 0 1 0
4 4 0 0 0 1
Upvotes: 1