Mark
Mark

Reputation: 67

Count number of times a value increases in a column

I have a data frame in R that looks something like this:

person  purch_date   num_purchased
Alex    2011-01-01   6
Alex    2011-06-05   5
Alex    2012-03-04   6
Beth    2011-02-04   7
Beth    2012-05-21   8
Beth    2013-11-11   10
Candy   2007-08-09   3
Candy   2009-10-01   2
Candy   2013-12-02   2

I have sorted it first by "person" and second by "purch_date".

I am trying to count how many increases in the "num_purchased" column occur for each person. I know the code to count the number of unique values for the column or how many times the value changed, but this is not what I am looking for as I only want to know how many times the value increased from one date to the next. Ideally, the output would look something like:

person  num_increases
Alex    1
Beth    2
Candy   0

Upvotes: 1

Views: 372

Answers (3)

Rich Scriven
Rich Scriven

Reputation: 99341

If you want a vector, you could use tapply. df is your original data.

foo <- function(x) sum(diff(x) > 0)

with(df, tapply(num_purchased, person, foo))
# Alex  Beth Candy 
#    1     2     0 

Or with dplyr and still using foo

library(dplyr)
group_by(df, person) %>% summarize(increases = foo(num_purchased))
#   person increases
# 1   Alex         1
# 2   Beth         2
# 3  Candy         0

Upvotes: 1

KFB
KFB

Reputation: 3501

Here is a data.table approach which will only give you result with increases > 0, i.e. those = 0 will not be tabulated. FYR though

library(data.table)
setDT(df)  ## set your data frame as data table
df[, diff(num_purchased), by=person][V1>0, .N, by=person]
#    person N
# 1:   Alex 1
# 2:   Beth 2

Edit.

Incorporate @Arun's comment. More compact and get the "0" count.

df[, sum(diff(num_purchased) > 0), by=person]
#    person V1
# 1:   Alex  1
# 2:   Beth  2
# 3:  Candy  0

Upvotes: 2

josliber
josliber

Reputation: 44330

You can get data in the format you want in a 1-liner with aggregate:

aggregate(num_purchased~person, data=dat, function(x) sum(diff(x) > 0))
#   person num_purchased
# 1   Alex             1
# 2   Beth             2
# 3  Candy             0

If you instead prefer a named vector for the output, I would suggest tapply as described in Richard Scriven's answer.

Upvotes: 1

Related Questions