Reputation: 33
I'm trying to find a way to create a matrix which counts values from each row of a data frame. I'd like it to recognise the values in each row of the data frame, and count how many times that value has occurred in all rows above the row the value occurs in (not the whole data frame).
The same value will never occur more than once in a single row of the data frame.
For example:
# df:
a b c
1 2 3
3 4 5
3 2 6
7 8 9
8 3 6
matrix result:
0 0 0 (none of the df values have occurred as there are no rows above)
1 0 0 (3 has occurred once above, the others have not occurred)
2 1 0 (3 has occurred twice above, 2 has occurred once above, 6 has not occurred)
0 0 0 (none of the df values have occurred in rows above)
1 3 1 (8 has occurred once, 3 has occurred 3 times, 6 has occurred once)
Upvotes: 2
Views: 116
Reputation: 33
I know we're not supposed to comment with "thanks", but thank you to all. I've marked Brian's response as the most useful because I'm pretty new to R and his was the example I could follow all the way through without needing to look anything up. I'll have fun finding out about all the other ways and new (to me) functions / approaches you've kindly shared though.
Upvotes: 0
Reputation: 992
Another...for fun
out<-matrix(1,nrow = nrow(df),ncol = ncol(df))
for(i in 1:nrow(df)){
out[i,]<-sapply(1:ncol(df),function(z) sum(unlist(df[0:(i-1),]) %in% df[i,z]))
}
out
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 0 0
[3,] 2 1 0
[4,] 0 0 0
[5,] 1 3 1
Upvotes: 0
Reputation: 6223
Here is another solution:
df = read.table(text = "a b c
1 2 3
3 4 5
3 2 6
7 8 9
8 3 6", header = T)
elements = sort(unique(unlist(df)))
frequency = sapply(elements, # for each element
function(element) {apply(df == element, 1, sum)}) # Sum the number of occurances per row
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 1 1 1 0 0 0 0 0 0
# [2,] 0 0 1 1 1 0 0 0 0
# [3,] 0 1 1 0 0 1 0 0 0
# [4,] 0 0 0 0 0 0 1 1 1
# [5,] 0 0 1 0 0 1 0 1 0
results = df
for(i in 1:nrow(df)){
for(j in 1:ncol(df))
results[i,j] = sum(frequency[1:i-1, # Sum the prevoius rows occurances
which(df[i,j] == elements)]) # Of the same element
}
# a b c
# 1 0 0 0
# 2 1 0 0
# 3 2 1 0
# 4 0 0 0
# 5 1 3 1
Upvotes: 0
Reputation: 83275
Three other approaches:
1) with base R:
temp <- stack(df)[c(outer(c(0,5,10), 1:5, '+')),]
temp$val2 <- with(temp, ave(values, values, FUN = seq_along)) - 1
df2 <- unstack(temp, val2 ~ ind)
which gives:
> df2 a b c 1 0 0 0 2 1 0 0 3 2 1 0 4 0 0 0 5 1 3 1
2) with data.table
:
library(data.table)
melt(setDT(df)[, r := .I],
id = 'r')[order(r), val2 := rowid(value) - 1
][, dcast(.SD, rowid(variable) ~ variable, value.var = 'val2')
][, variable := NULL][]
which gives the same result.
3) with the tidyverse
:
library(dplyr)
library(tidyr)
df %>%
mutate(r = row_number()) %>%
gather(k, v, -4) %>%
arrange(r) %>%
group_by(v) %>%
mutate(v2 = row_number() - 1) %>%
ungroup() %>%
select(r, k, v2) %>%
spread(k, v2)
which, off course, also gives the same result.
Upvotes: 0
Reputation: 146224
Here's one way:
# convert to a vector
x = as.vector(t(as.matrix(df)))
# get counts of each unique element (in the right place)
# and add them up
res = rowSums(sapply(unique(x), function(z) {
r = integer(length(x))
r[x == z] = 0:(sum(x == z) - 1)
return(r)
}))
# convert to matrix
res = matrix(res, ncol = ncol(df), byrow = T)
res
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 1 0 0
# [3,] 2 1 0
# [4,] 0 0 0
# [5,] 1 3 1
Using this data:
df = read.table(text = "
a b c
1 2 3
3 4 5
3 2 6
7 8 9
8 3 6", header = T)
Upvotes: 1