Reputation:

R count and store number of duplicates to another column

I have a dataset with over 100 000 rows. I would like to find a number of appearances in a specific column for each row and save it to another column (see example below).

I could iterate through entire dataset for each row but that would be 100k * 100k iterations. Is there any more efficient way to do that?

Input dataset

Output dataset

A B number_of_appearances (based on column B)
1 6    2
3 1    2
2 6    2
4 2    1
1 4    1
9 1    2

Upvotes: 2

Answers (4)

akrun

Reputation: 887881

We can use ave from base R

df1$appearance_in_b <- with(df1, ave(B, B, FUN=length))
df1$appearance_in_b
#[1] 2 2 2 1 1 2

Upvotes: 1

Jaime Caffarel

Reputation: 2469

Just to add the data.table approach:

library(data.table)
dt <- data.table(A = c(1, 3, 2, 4, 1, 9), B = c(6, 1, 6, 2, 4, 1))

dt[, number_of_appearances := .N, by = "B"]

print(dt)
   A B number_of_appearances
1: 1 6                     2
2: 3 1                     2
3: 2 6                     2
4: 4 2                     1
5: 1 4                     1
6: 9 1                     2

Upvotes: 0

Bruno Zamengo

Reputation: 870

And without dplyr:

# create the dataframe
x = sample(1:3, 10, TRUE);
y = sample(c("a","b","c"), 10, TRUE);
d = data.frame(x,y);

# get the frequencies of y
tb = table(d$y);
tb = as.data.frame(tb);

# make an "SQL join-like" merging of the two data-frames
res = merge(d,tb,by.x="y",by.y="Var1", sort=FALSE);

Upvotes: 1

ottlngr

Reputation: 1247

You can use dplyr for that:

library(dplyr)

a <- c(2,1,2,3,4,3,2,1,4)
b <- c(3,2,1,2,3,4,3,2,1)

df <- data.frame(a, b)

df %>%
  group_by(b) %>%
  mutate(appearences_in_b = n())

Source: local data frame [9 x 3]
Groups: b [4]

     a     b appearences_in_b
   <dbl> <dbl>            <int>
1     2     3                3
2     1     2                3
3     2     1                2
4     3     2                3
5     4     3                3
6     3     4                1
7     2     3                3
8     1     2                3
9     4     1                2

Upvotes: 2

R count and store number of duplicates to another column

Answers (4)

Related Questions