rnorouzian
rnorouzian

Reputation: 7517

add summary `n` from one dataframe to another dataframe (tidyverse)

I was wondering if there might be a way to replace the column fpc in DATA2 with corresponding fpc obtained from DATA1?

library(tidyverse)

dat <- read.csv('https://raw.githubusercontent.com/rnorouzian/d/master/su.csv')

## 10000 rows ################
 DATA1 <- dat %>%
    group_by(across(all_of(c("gender", "pre")))) %>%
    summarise(n = n(), .groups = 'drop') %>%
    mutate(fpc = n/sum(n)) %>% 
    right_join(dat)


dat2 <- read.csv('https://raw.githubusercontent.com/rnorouzian/d/master/out.csv')
  
## 200 rows #################
DATA2 <- dat2 %>%
  group_by(across(all_of(c("gender", "pre")))) %>%
  summarise(n = n(), .groups = 'drop') %>%
  mutate(fpc = n/sum(n)) %>% 
  right_join(dat2)

Upvotes: 1

Views: 90

Answers (2)

akrun
akrun

Reputation: 887078

It would be more efficient to do this in data.table

library(data.table)
setDT(DATA2)[as.data.table(unique(DATA1[c('gender', 'pre', 'fpc')])),
            fpc := i.fpc, on = .(gender, pre)]

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388962

You can join the dataframe and use coalesce to select fpc from DATA2.

library(dplyr)

result <- DATA2 %>%
           left_join(DATA1 %>% distinct(gender, pre, fpc), 
                     by = c('gender', 'pre')) %>%
           mutate(fpc = coalesce(fpc.y, fpc.x)) %>%
           select(names(DATA2))

nrow(result)
#[1] 200

Upvotes: 1

Related Questions