PSraj
PSraj

Reputation: 229

Aggregating data in R

user_id date        datetime  page
217568  6/12/2015   49:23.9   Vodafone | How to get in touch with Vodafone
135437  6/10/2015   43:35.7   My Vodafone – Manage your Vodafone Pay Monthly Account Online – Vodafone
196094  6/13/2015   33:39.4   Check the status of Vodafone’s mobile network in real-time
74197   6/6/2015    52:46.1   undefined
153501  6/5/2015    02:55.5   Device Details
71459   6/4/2015    54:05.5 
90906   6/9/2015    35:41.7   Vodafone | Mobile Phones
30886   6/9/2015    15:59.8   Vodafone | Mobile Phones
217568  6/9/2015    10:52.9   Vodafone | Mobile Phones
137324  6/16/2015   40:51.7   Vodafone | How to get in touch with Vodafone

This is top 10 rows of the sample data i have , I need to aggreagte "page" column with respect to both date and user_id(this is a unique identifier ), basically I want to arrange this data as, on a particular (user_ID) I need all the pages that he visited for a particular date in one row separated by "_" . I tried using this : tabel <- dt[,.SD[,paste(page, sep=",", collapse="_")], by=date] dt being my data frame, but this gives me the pages visited for a particular date, but I want at (user_id) level . How can i achieve this using R?

Resulting table should look something like this .(example)

row.names   date        pages
217568     2015-06-12   page1,page2
217568     2015-06-13   page3,page5

page1,page2,page3,page5 being pages from column "page"

Upvotes: 2

Views: 187

Answers (2)

akrun
akrun

Reputation: 887108

Using data.table

 library(data.table)
 setDT(df1)[, list(pages=paste(page, collapse="_")),
          list(user_id, date=as.Date(date, '%m/%d/%Y'))]

Or using dplyr

 library(dplyr)
 df1 %>% 
     group_by(user_id, date=as.Date(date, '%m/%d/%Y')) %>%
     summarise(pages=paste(page, collapse='_'))

Upvotes: 1

snaut
snaut

Reputation: 2535

You could use the aggregate function from the stats package, try something like this:

aggregate(dt$page, list(dt$user_id, dt$date), FUN=paste, collapse=", ")

Be careful with the dates though, if you store them as POSIXlt the coercion to factor could be problematic, if the dates are stored as POSIXct or string this should be no problem.

Upvotes: 2

Related Questions