Reputation: 15
I'm working on a dataset based on hotel reviews. I've created a subset (440880 rows) as follow:
df2
Hotel_ID Review_date Negative_Rev Positive_Rev Negative Positive
1 2015/08/20 bad staff comfortable room 1 1
1 2015/08/30 No Negative good staff 0 1
2 2015/09/24 no staff No Positive 1 1
2 2016/02/03 No Breakfast near city centre 1 1
2 2016/03/22 No Negative No Positive 0 0
where Negative
and Positive
are variables based on Negative_Rev
and Positive_Rev
(x = 0 if No Negative or No Positive
).
I would like to group df2
by Hotel_ID
and Review_Date
and create two new columns called Daily_Negative
and Daily_Positive
derived from cumsum
function of respectively Negative
and Positive
.
I've tried, for example, with this:
> df$Daily_Positive <- ddply(df, .(Review_Date, Hotel_ID), transform, Daily_Positive = cumsum(Positive))
Upvotes: 0
Views: 176
Reputation: 165
Here is another soluation using the data.table
package:
library(data.table)
df2[, .(Daily_Negative=sum(Negative), Daily_Positive=sum(Positive)), by=.(Hotel_ID, Review_date)]
Upvotes: 1
Reputation: 415
library(dplyr)
df2 <- df2 %>% group_by(Hotel_ID,Review_date) %>%
summarise(Daily_Negative = sum(Negative),
Daily_Positive = sum(Positive)) %>%
ungroup()
Upvotes: 1