Hbrandi
Hbrandi

Reputation: 171

Conditional counting in R

I have a question I hope some of you might help me with. I am doing a thesis on pharmaceuticals and the effect from parallelimports. I am dealing with this in R, having a Panel Dataset

I need a variable, that counts for a given original product - how many parallelimporters are there for this given time period.

Product_ID     PI     t    
    1          0      1
    1          1      1
    1          1      1
    1          0      2
    1          1      2
    1          1      2
    1          1      2
    1          1      2
    2          0      1
    2          1      1
    2          0      2
    2          1      2
    2          0      3
    2          1      3
    2          1      3
    2          1      3

Ideally what i want here is a new column, like number of PI-products (PI=1) for an original (PI=0) at time, t. So the output would be like:

Product_ID     PI     t     nPIcomp
    1          0      1        2
    1          1      1
    1          1      1
    1          0      2        4
    1          1      2
    1          1      2
    1          1      2
    1          1      2
    2          0      1        1
    2          1      1
    2          0      2        1
    2          1      2
    2          0      3        3
    2          1      3
    2          1      3
    2          1      3

I hope I have made my issue clear :)

Thanks in advance, Henrik

Upvotes: 4

Views: 2063

Answers (3)

flodel
flodel

Reputation: 89057

I would use ave and your two columns Product_ID and t as grouping variables. Then, within each group, apply a function that returns the sum of PI followed by the appropriate number of NAs:

dat <- transform(dat, nPIcomp = ave(PI, Product_ID, t,
                                    FUN = function(z) {
                                      n <- sum(z)
                                      c(n, rep(NA, n))
                                    }))

The same idea can be used with the data.table package if your data is large and speed is a concern.

Upvotes: 2

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 59970

Roman's answers gives exactly what you want. In case you want to summarise the data this would be handy, using the plyr pacakge (df is what I have called your data.frame)...

ddply( df , .(Product_ID , t ) , summarise , nPIcomp = sum(PI) )
#  Product_ID t nPIcomp
#1          1 1       2
#2          1 2       4
#3          2 1       1
#4          2 2       1
#5          2 3       3

Upvotes: 1

Roman Luštrik
Roman Luštrik

Reputation: 70643

Something like this?

x <- read.table(text = "Product_ID     PI     t    
    1          0      1
    1          1      1
    1          1      1
    1          0      2
    1          1      2
    1          1      2
    1          1      2
    1          1      2
    2          0      1
    2          1      1
    2          0      2
    2          1      2
    2          0      3
    2          1      3
    2          1      3
    2          1      3", header = TRUE)

find.count <- rle(x$PI)
count <- find.count$lengths[find.count$values == 1]
x[x$PI == 0, "nPIcomp"] <- count

   Product_ID PI t nPIcomp
1           1  0 1       2
2           1  1 1      NA
3           1  1 1      NA
4           1  0 2       4
5           1  1 2      NA
6           1  1 2      NA
7           1  1 2      NA
8           1  1 2      NA
9           2  0 1       1
10          2  1 1      NA
11          2  0 2       1
12          2  1 2      NA
13          2  0 3       3
14          2  1 3      NA
15          2  1 3      NA
16          2  1 3      NA

Upvotes: 3

Related Questions