PassaroBagante
PassaroBagante

Reputation: 99

Change the values of a column based on another dataframe condition

I have a dataframe that looks something like this.

Var1    Var2
123     ADH6
123     ADH6
123     ADH6
123     ?
56756   CYP4V2
56756   ?
56756   ?
rs98    NUP62
rs98    ?
rs98    NUP62
rs98    NUP62

Var1 and var2 should be consistent 1 to 1. So whenever you have a value in var1 you should have the correspondant value in Var2.

It’s more clear if you visualize.

Var1    Var2
123     ADH6
123     ADH6
123     ADH6
123     ADH6
56756   CYP4V2
56756   CYP4V2
56756   CYP4V2
rs98    NUP62
rs98    NUP62
rs98    NUP62
rs98    NUP62

The problem is that I have thousands of different values. So I can’t create a rule for these specific cases, I need a rule that says something like: if Var2 is missing, then fill with the value of Var2 when same Var1.

In order to solve this, I though of, first, getting the average of the group var2, and creating a dataframe that would act as a dictionary. Then using this dictionary to fill the missing values.

I have this RepEx, but I miss the last part of the code:

library(dplyr)

first_column <- c("rs1","rs1","rs1","rs2","rs2","rs2","rs2","rs2","rs3","rs3","rs3","rs3","rs3")
second_column <- c("APOA","APOA",NA,"SIGMA",NA,NA,"SIGMA","SIGMA","SIGMA","ALPHA","ALPHA",NA,NA)

 
df <- data.frame(first_column, second_column)
df_noNa <- df[!is.na(df$second_column),]

df1 <- df_noNa %>%
  group_by(second_column) %>%
  count(first_column, second_column) %>%
  slice(which.max(n)) %>%
  select(-n)

Upvotes: 0

Views: 41

Answers (1)

TarJae
TarJae

Reputation: 79311

We coult group and use fill:

Update please see comment from onyambu(many thanks)

In order to address order issues: .direction = "updown or .direction = "downup"

library(tidyverse)

first_column <- c("rs1","rs1","rs1","rs2","rs2","rs2","rs2","rs2","rs3","rs3","rs3","rs3","rs3")
second_column <- c("APOA","APOA",NA,"SIGMA",NA,NA,"SIGMA","SIGMA","SIGMA","ALPHA","ALPHA",NA,NA)

df <- tibble(first_column, second_column)

df %>% 
  group_by(first_column) %>% 
  fill(second_column, .direction = "down")
   first_column second_column
   <chr>        <chr>        
 1 rs1          APOA         
 2 rs1          APOA         
 3 rs1          APOA         
 4 rs2          SIGMA        
 5 rs2          SIGMA        
 6 rs2          SIGMA        
 7 rs2          SIGMA        
 8 rs2          SIGMA        
 9 rs3          SIGMA        
10 rs3          ALPHA        
11 rs3          ALPHA        
12 rs3          ALPHA        
13 rs3          ALPHA 

Upvotes: 2

Related Questions