Owen
Owen

Reputation: 33

Count occurrence of a value within a data frame within the rows above it

I'm trying to find a way to create a matrix which counts values from each row of a data frame. I'd like it to recognise the values in each row of the data frame, and count how many times that value has occurred in all rows above the row the value occurs in (not the whole data frame).

The same value will never occur more than once in a single row of the data frame.

For example:

# df:

a b c
1 2 3
3 4 5
3 2 6
7 8 9
8 3 6

matrix result:

0 0 0 (none of the df values have occurred as there are no rows above)

1 0 0 (3 has occurred once above, the others have not occurred)

2 1 0 (3 has occurred twice above, 2 has occurred once above, 6 has not occurred)

0 0 0 (none of the df values have occurred in rows above)

1 3 1 (8 has occurred once, 3 has occurred 3 times, 6 has occurred once)

Upvotes: 2

Views: 116

Answers (5)

Owen
Owen

Reputation: 33

I know we're not supposed to comment with "thanks", but thank you to all. I've marked Brian's response as the most useful because I'm pretty new to R and his was the example I could follow all the way through without needing to look anything up. I'll have fun finding out about all the other ways and new (to me) functions / approaches you've kindly shared though.

Upvotes: 0

Brian Davis
Brian Davis

Reputation: 992

Another...for fun

out<-matrix(1,nrow = nrow(df),ncol = ncol(df))
for(i in 1:nrow(df)){
  out[i,]<-sapply(1:ncol(df),function(z) sum(unlist(df[0:(i-1),]) %in% df[i,z]))
}

out
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    1    0    0
[3,]    2    1    0
[4,]    0    0    0
[5,]    1    3    1

Upvotes: 0

Deena
Deena

Reputation: 6223

Here is another solution:

df = read.table(text = "a b c
                1 2 3
                3 4 5
                3 2 6
                7 8 9
                8 3 6", header = T)

elements = sort(unique(unlist(df)))
frequency = sapply(elements, # for each element 
                   function(element) {apply(df == element, 1, sum)}) # Sum the number of occurances per row
#       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,]    1    1    1    0    0    0    0    0    0
# [2,]    0    0    1    1    1    0    0    0    0
# [3,]    0    1    1    0    0    1    0    0    0
# [4,]    0    0    0    0    0    0    1    1    1
# [5,]    0    0    1    0    0    1    0    1    0


results = df
for(i in 1:nrow(df)){
  for(j in 1:ncol(df))
    results[i,j] = sum(frequency[1:i-1, # Sum the prevoius rows occurances  
                                 which(df[i,j] == elements)]) # Of the same element
}
# a b c
# 1 0 0 0
# 2 1 0 0
# 3 2 1 0
# 4 0 0 0
# 5 1 3 1

Upvotes: 0

Jaap
Jaap

Reputation: 83275

Three other approaches:

1) with base R:

temp <- stack(df)[c(outer(c(0,5,10), 1:5, '+')),]
temp$val2 <- with(temp, ave(values, values, FUN = seq_along)) - 1
df2 <- unstack(temp, val2 ~ ind)

which gives:

> df2
  a b c
1 0 0 0
2 1 0 0
3 2 1 0
4 0 0 0
5 1 3 1

2) with data.table:

library(data.table)
melt(setDT(df)[, r := .I],
     id = 'r')[order(r), val2 := rowid(value) - 1
               ][, dcast(.SD, rowid(variable) ~ variable, value.var = 'val2')
                 ][, variable := NULL][]

which gives the same result.

3) with the tidyverse:

library(dplyr)
library(tidyr)
df %>% 
  mutate(r = row_number()) %>% 
  gather(k, v, -4) %>% 
  arrange(r) %>% 
  group_by(v) %>% 
  mutate(v2 = row_number() - 1) %>% 
  ungroup() %>% 
  select(r, k, v2) %>% 
  spread(k, v2)

which, off course, also gives the same result.

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 146224

Here's one way:

# convert to a vector
x = as.vector(t(as.matrix(df)))

# get counts of each unique element (in the right place)
# and add them up
res = rowSums(sapply(unique(x), function(z) {
  r = integer(length(x))
  r[x == z] = 0:(sum(x == z) - 1)
  return(r)
}))

# convert to matrix
res = matrix(res, ncol = ncol(df), byrow = T)
res
#      [,1] [,2] [,3]
# [1,]    0    0    0
# [2,]    1    0    0
# [3,]    2    1    0
# [4,]    0    0    0
# [5,]    1    3    1

Using this data:

df = read.table(text = "
a b c
1 2 3
3 4 5
3 2 6
7 8 9
8 3 6", header = T)

Upvotes: 1

Related Questions