Effectively replace values of dataframe column with relative frequency in R

Question

I have a huge dataset (~7 Gb) and I need EFFECTIVELY replace one variable (iser id) by relative frequency (i.e. freq(user_id) / unique(user_id)). Minimal example:

id <- c(1050, 1324, 5, 7, 1050, 7, 8)
table(id)

id
   5    7    8 1050 1324 
   1    2    1    2    1

Then I tried

freq <- ave(id, id, FUN = function(X) length(X) / length(unique(id)))
df <- data.frame(id = id, freq = freq)

Output:

    id freq
1 1050  0.4
2 1324  0.2
3    5  0.2
4    7  0.4
5 1050  0.4
6    7  0.4
7    8  0.2

But on my data set this solution has been working for three(!) hours already. Any help is appreciated :)

R 2 minutes tutorials · Accepted Answer

Here is a tidyverse implementation:

library(tidyverse)
id <- c(1050, 1324, 5, 7, 1050, 7, 8)

data_frame( id = id)-> my_df # creating df

  my_df%>% 
    mutate(unique =  unique(id) %>% length) %>% # addying column unique, with the number of unique id
  group_by(id) %>%                              # group by id
  mutate(
    n=n(),                                      # number of observations for the current group         
    freq = n / unique                           # n / unique gives the freq
    )

if you want to learn more about group_by check this tutorial: https://www.youtube.com/watch?v=70UcgabaB_I&t=14s

Effectively replace values of dataframe column with relative frequency in R

Answers (2)

Related Questions