Reputation:
I have a dataset with over 100 000 rows. I would like to find a number of appearances in a specific column for each row and save it to another column (see example below).
I could iterate through entire dataset for each row but that would be 100k * 100k iterations. Is there any more efficient way to do that?
Input dataset
A B
1 6
3 1
2 6
4 2
1 4
9 1
Output dataset
A B number_of_appearances (based on column B)
1 6 2
3 1 2
2 6 2
4 2 1
1 4 1
9 1 2
Upvotes: 2
Views: 1896
Reputation: 886938
We can use ave
from base R
df1$appearance_in_b <- with(df1, ave(B, B, FUN=length))
df1$appearance_in_b
#[1] 2 2 2 1 1 2
Upvotes: 1
Reputation: 2469
Just to add the data.table
approach:
library(data.table)
dt <- data.table(A = c(1, 3, 2, 4, 1, 9), B = c(6, 1, 6, 2, 4, 1))
dt[, number_of_appearances := .N, by = "B"]
print(dt)
A B number_of_appearances
1: 1 6 2
2: 3 1 2
3: 2 6 2
4: 4 2 1
5: 1 4 1
6: 9 1 2
Upvotes: 0
Reputation: 860
And without dplyr
:
# create the dataframe
x = sample(1:3, 10, TRUE);
y = sample(c("a","b","c"), 10, TRUE);
d = data.frame(x,y);
# get the frequencies of y
tb = table(d$y);
tb = as.data.frame(tb);
# make an "SQL join-like" merging of the two data-frames
res = merge(d,tb,by.x="y",by.y="Var1", sort=FALSE);
Upvotes: 1
Reputation: 1247
You can use dplyr
for that:
library(dplyr)
a <- c(2,1,2,3,4,3,2,1,4)
b <- c(3,2,1,2,3,4,3,2,1)
df <- data.frame(a, b)
df %>%
group_by(b) %>%
mutate(appearences_in_b = n())
Source: local data frame [9 x 3]
Groups: b [4]
a b appearences_in_b
<dbl> <dbl> <int>
1 2 3 3
2 1 2 3
3 2 1 2
4 3 2 3
5 4 3 3
6 3 4 1
7 2 3 3
8 1 2 3
9 4 1 2
Upvotes: 2