Reputation: 67
I have a data frame in R that looks something like this:
person purch_date num_purchased
Alex 2011-01-01 6
Alex 2011-06-05 5
Alex 2012-03-04 6
Beth 2011-02-04 7
Beth 2012-05-21 8
Beth 2013-11-11 10
Candy 2007-08-09 3
Candy 2009-10-01 2
Candy 2013-12-02 2
I have sorted it first by "person" and second by "purch_date".
I am trying to count how many increases in the "num_purchased" column occur for each person. I know the code to count the number of unique values for the column or how many times the value changed, but this is not what I am looking for as I only want to know how many times the value increased from one date to the next. Ideally, the output would look something like:
person num_increases
Alex 1
Beth 2
Candy 0
Upvotes: 1
Views: 372
Reputation: 99341
If you want a vector, you could use tapply
. df
is your original data.
foo <- function(x) sum(diff(x) > 0)
with(df, tapply(num_purchased, person, foo))
# Alex Beth Candy
# 1 2 0
Or with dplyr
and still using foo
library(dplyr)
group_by(df, person) %>% summarize(increases = foo(num_purchased))
# person increases
# 1 Alex 1
# 2 Beth 2
# 3 Candy 0
Upvotes: 1
Reputation: 3501
Here is a data.table approach which will only give you result with increases > 0, i.e. those = 0 will not be tabulated. FYR though
library(data.table)
setDT(df) ## set your data frame as data table
df[, diff(num_purchased), by=person][V1>0, .N, by=person]
# person N
# 1: Alex 1
# 2: Beth 2
Incorporate @Arun's comment. More compact and get the "0" count.
df[, sum(diff(num_purchased) > 0), by=person]
# person V1
# 1: Alex 1
# 2: Beth 2
# 3: Candy 0
Upvotes: 2
Reputation: 44330
You can get data in the format you want in a 1-liner with aggregate
:
aggregate(num_purchased~person, data=dat, function(x) sum(diff(x) > 0))
# person num_purchased
# 1 Alex 1
# 2 Beth 2
# 3 Candy 0
If you instead prefer a named vector for the output, I would suggest tapply
as described in Richard Scriven's answer.
Upvotes: 1