Reputation: 13
First of all I have to say that I am still new to coding and R, so this might be stupid question, but I couldn't find a question like this (maybe because I didn't know what to exactly search for)
I have a very large pagepath dataset containing clientID's and their browsing behavior on a website. I would like to make dummy variables for if a client saw a certain page. The problem is that I want this clientID to then have a 1 at every row that contains that particular clientID, and not just at the row where the clientID visited that page.
How my dataset looks like
ClientID pagepath
1 12345 /home
2 12345 /test1
3 12345 /test2
4 67890 /test1
5 67890 /home
6 54321 /test1
7 54321 /home
8 09876 /home
What I want as output:
ClientID pagepath dummy_test1 dummy_test2
1 12345 /home 1 1
2 12345 /test1 1 1
3 12345 /test2 1 1
4 67890 /test1 1 0
5 67890 /home 1 0
6 54321 /test2 0 1
7 54321 /home 0 1
8 09876 /home 0 0
Help would be greatly appreciated!
Upvotes: 1
Views: 41
Reputation: 389135
We can group_by
ClientID
and check for string 'test1'
and 'test2'
in pagepath
and create two new columns.
library(dplyr)
df %>%
group_by(ClientID) %>%
mutate(dummy_test1 = +(any(grepl('test1', pagepath))),
dummy_test2 = +(any(grepl('test2', pagepath))))
# ClientID pagepath dummy_test1 dummy_test2
# <int> <fct> <int> <int>
#1 12345 /home 1 1
#2 12345 /test1 1 1
#3 12345 /test2 1 1
#4 67890 /test1 1 0
#5 67890 /home 1 0
#6 54321 /test1 1 0
#7 54321 /home 1 0
#8 9876 /home 0 0
+
in front of any
converts logical values (TRUE
/FALSE
) to integer values (1
/0
).
Upvotes: 1