Brandon B
Brandon B

Reputation: 15

Need to format R data

This is a follow up to my only other question, but hopefully more direct. I need data that looks like this:

     custID   custChannel            custDate
1     151        Direct 2015-10-10 00:15:32
2     151    GooglePaid 2015-10-10 00:16:45
3     151     Converted 2015-10-10 00:17:01
4    5655      BingPaid 2015-10-11 00:20:12
5    7855 GoogleOrganic 2015-10-12 00:05:32
6    7862  YahooOrganic 2015-10-13 00:18:20
7    9655    GooglePaid 2015-10-13 00:08:35
8    9655    GooglePaid 2015-10-13 00:11:11
9    9655     Converted 2015-10-13 00:11:35
10   9888    GooglePaid 2015-10-14 00:08:35
11   9888    GooglePaid 2015-10-14 00:11:11
12   9888     Converted 2015-10-14 00:11:35

To be sorted so that the output looks like this:

  Path                                 Path Count
BingPaid                                   1
Direct>GooglePaid>Converted                1
GoogleOrganic                              1
GooglePaid>GooglePaid>Converted            2
YahooOrganic                               1

The idea is to capture customer paths (as identified by custID) and count for the entire data set how many people took that exact path (Path Count). I need to perform this over a data set of 5 million rows.

Upvotes: 1

Views: 38

Answers (1)

Rentrop
Rentrop

Reputation: 21497

Using data.table you can do this as follows:

require(data.table)
setDT(dat)[,paste(custChannel, collapse = ">"), custID][,.("path length"=.N), .(path=V1)]

Result:

                              path path length
1:     Direct>GooglePaid>Converted           1
2:                        BingPaid           1
3:                   GoogleOrganic           1
4:                    YahooOrganic           1
5: GooglePaid>GooglePaid>Converted           2

Step by step:

setDT(dat) # make dat a data.table
# get path by custID
dat_path <- dat[,paste(custChannel, collapse = ">"), custID] 
#get length by path created in the previous step
res <- dat_path[,.("path length"=.N), by=.(path=V1)] 

Have a look at dat_path and resto understand what happened.

Upvotes: 2

Related Questions