Yunus YILDIRIM
Yunus YILDIRIM

Reputation: 91

How can I find the number of a vector's elements in another vector?

I have two vectors. First vector name is comments$author_id and second is enrolments$learner_id. I want to add new column into enrolmens dataframe that shows count of repeated rows in comments$author_id vector for each enrolment$learner_id row.

Example:

 if(enrolments$learner_id[1] repeated 5 times in comments$author_id)
    enrolments$freqs[1] = 5

Can I do this don't using any loops?

The vector samples are as follows:

df1 <- data.frame(v1 = c(1,1,1,4,5,5,4,1,2,3,5,6,2,1,5,2,3,4,1,6,4,2,3,5,1,2,5,4))
df2 <- data.frame(v2 = c(1,2,3,4,5,6))

I want to add "counts" column to "df2" that shows counts of repeated v2 element in v1.

Upvotes: 1

Views: 101

Answers (3)

lukeA
lukeA

Reputation: 54237

"[tabulate] gives me this error: Error in $<-.data.frame(tmp, "comments_count", value = c(0L, 0L, : replacement has 25596 rows, data has 25597"

That is prly because there is one value at the end of df2$v2, which are not part of df1$v1 - I add 0 and 7 to your example to show that:

df1 <- data.frame(v1 = c(1,1,1,4,5,5,4,1,2,3,5,6,2,1,5,2,3,4,1,6,4,2,3,5,1,2,5,4))
df2 <- data.frame(v2 = c(1,2,3,0,4,5,6,7))
df2$count <- tabulate(factor(df1$v1, df2$v2))
# Error in `$<-.data.frame`(`*tmp*`, count, value = c(7L, 5L, 3L, 0L, 5L,  : 
#   replacement has 7 rows, data has 8

To correct that using tabulate, which might be the fastest solution on larger data:

df2$count <- tabulate(factor(df1$v1, df2$v2), length(df2$v2))
df2
# v2 count
# 1  1     7
# 2  2     5
# 3  3     3
# 4  0     0
# 5  4     5
# 6  5     6
# 7  6     2
# 8  7     0

See ?tabulate for the documentation on that function.

Upvotes: 1

thelatemail
thelatemail

Reputation: 93803

What you essentially are doing is aggregating the df1 to get a count, and then adding this count back to the df2 set. This logic can be easily translated to a bunch of different methods:

# base R
merge(
  df2,
  aggregate(cbind(df1[0], count=1), df1["v1"], FUN=sum),
  by.x="v2", by.y="v1", all.x=TRUE
)

# data.table
library(data.table)
setDT(df1)
setDT(df2)
df2[df1[, .(count=.N), by=v1], on=c("v2"="v1")]

# dplyr
library(dplyr)
df1 %>%
  group_by(v1) %>%
  count() %>%
  left_join(df2, ., by=c("v2"="v1"))

#  v2 count
#1  1     7
#2  2     5
#3  3     3
#4  4     5
#5  5     6
#6  6     2

Upvotes: 1

jruf003
jruf003

Reputation: 1042

Using your df1 and df2 example, you could do it like this:

# Make data
df1 = data.frame(v1 = c(1,1,1,4,5,5,4,1,2,3,5,6,2,1,5,2,3,4,1,6,4,2,3,5,1,2,5,4))
df2 = data.frame(v2 = c(1,2,3,4,5,6))

# Add 'count' variable as reqeuested
df2$counts = sapply(df2$v2, function(x) {
  sum(df1$v1 == x, na.rm = T) #na.rm=T just in case df1$v1 has missing values
})
df2 #view output

Upvotes: 1

Related Questions