Change the values of a column based on another dataframe condition

Question

I have a dataframe that looks something like this.

Var1    Var2
123     ADH6
123     ADH6
123     ADH6
123     ?
56756   CYP4V2
56756   ?
56756   ?
rs98    NUP62
rs98    ?
rs98    NUP62
rs98    NUP62

Var1 and var2 should be consistent 1 to 1. So whenever you have a value in var1 you should have the correspondant value in Var2.

It’s more clear if you visualize.

Var1    Var2
123     ADH6
123     ADH6
123     ADH6
123     ADH6
56756   CYP4V2
56756   CYP4V2
56756   CYP4V2
rs98    NUP62
rs98    NUP62
rs98    NUP62
rs98    NUP62

The problem is that I have thousands of different values. So I can’t create a rule for these specific cases, I need a rule that says something like: if Var2 is missing, then fill with the value of Var2 when same Var1.

In order to solve this, I though of, first, getting the average of the group var2, and creating a dataframe that would act as a dictionary. Then using this dictionary to fill the missing values.

I have this RepEx, but I miss the last part of the code:

library(dplyr)

first_column <- c("rs1","rs1","rs1","rs2","rs2","rs2","rs2","rs2","rs3","rs3","rs3","rs3","rs3")
second_column <- c("APOA","APOA",NA,"SIGMA",NA,NA,"SIGMA","SIGMA","SIGMA","ALPHA","ALPHA",NA,NA)

 
df <- data.frame(first_column, second_column)
df_noNa <- df[!is.na(df$second_column),]

df1 <- df_noNa %>%
  group_by(second_column) %>%
  count(first_column, second_column) %>%
  slice(which.max(n)) %>%
  select(-n)

Change the values of a column based on another dataframe condition

Answers (1)

Related Questions