HarmlessEcon
HarmlessEcon

Reputation: 75

Count number of values which are less than current value

I'd like to count the rows in the column input if the values are smaller than the current row (Please see the results wanted below). The issue to me is that the condition is based on current row value, so it is very different from general case where the condition is a fixed number.

data <- data.frame(input = c(1,1,1,1,2,2,3,5,5,5,5,6))

    input
1      1
2      1
3      1
4      1
5      2
6      2
7      3
8      5
9      5
10     5
11     5
12     6

The results I expect to get are like this. For example, for observations 5 and 6 (with value 2), there are 4 observations with value 1 less than their value 2. Hence count is given value 4.

    input count
1      1     0
2      1     0
3      1     0
4      1     0
5      2     4
6      2     4
7      3     6
8      5     7
9      5     7
10     5     7
11     5     7
12     6    11

Edit: as I am dealing with grouped data with dplyr, the ultimate results I wish to get is like below, that is, I am wishing the conditions could be dynamic within each group.

data <- data.frame(id = c(1,1,2,2,2,3,3,4,4,4,4,4), 
input = c(1,1,1,1,2,2,3,5,5,5,5,6), 
count=c(0,0,0,0,2,0,1,0,0,0,0,4))

   id input count
1   1     1     0
2   1     1     0
3   2     1     0
4   2     1     0
5   2     2     2
6   3     2     0
7   3     3     1
8   4     5     0
9   4     5     0
10  4     5     0
11  4     5     0
12  4     6     4

Upvotes: 5

Views: 5087

Answers (3)

Henrik
Henrik

Reputation: 67778

1. outer and rowSums

data$count <- with(data, rowSums(outer(input, input, `>`)))

2. table and cumsum

tt <- cumsum(table(data$input))
v <- setNames(c(0, head(tt, -1)), c(head(names(tt), -1), tail(names(tt), 1)))
data$count <- v[match(data$input, names(v))]

3. data.table non-equi join

Perhaps more efficient with a non-equi join in data.table. Count number of rows (.N) for each match (by = .EACHI).

library(data.table)
setDT(data)
data[data, on = .(input < input), .N, by = .EACHI]

If your data is grouped by 'id', as in your update, join on that variable as well:

data[data, on = .(id, input < input), .N, by = .EACHI]

#     id input N
#  1:  1     1 0
#  2:  1     1 0
#  3:  2     1 0
#  4:  2     1 0
#  5:  2     2 2
#  6:  3     2 0
#  7:  3     3 1
#  8:  4     5 0
#  9:  4     5 0
# 10:  4     5 0
# 11:  4     5 0
# 12:  4     6 4

Upvotes: 4

akrun
akrun

Reputation: 887301

Here is an option with tidyverse

library(tidyverse)
data %>%
   mutate(count = map_int(input, ~ sum(.x > input))) 
#    input count
#1      1     0
#2      1     0
#3      1     0
#4      1     0
#5      2     4
#6      2     4
#7      3     6
#8      5     7
#9      5     7
#10     5     7
#11     5     7
#12     6    11

Update

With the updated data, add the group by 'id' in the above code

data %>% 
  group_by(id) %>% 
  mutate(count1 = map_int(input, ~ sum(.x > input)))
# A tibble: 12 x 4
# Groups:   id [4]
#      id input count count1
#   <dbl> <dbl> <dbl>  <int>
# 1     1     1     0      0
# 2     1     1     0      0
# 3     2     1     0      0
# 4     2     1     0      0
# 5     2     2     2      2
# 6     3     2     0      0
# 7     3     3     1      1
# 8     4     5     0      0
# 9     4     5     0      0
#10     4     5     0      0
#11     4     5     0      0
#12     4     6     4      4

Upvotes: 5

Ronak Shah
Ronak Shah

Reputation: 389055

In base R, we can use sapply and for each input count how many values are greater than itself.

data$count <- sapply(data$input, function(x) sum(x > data$input))

data

#   input count
#1      1     0
#2      1     0
#3      1     0
#4      1     0
#5      2     4
#6      2     4
#7      3     6
#8      5     7
#9      5     7
#10     5     7
#11     5     7
#12     6    11

With dplyr one way would be using rowwise function and following the same logic.

library(dplyr)

data %>%
  rowwise() %>%
  mutate(count = sum(input > data$input))

Upvotes: 4

Related Questions