Replacing missing values of a column vector

Question

In a dataframe I have two categorical variables, say qs and vr of a dataframe df. This dataframe is quite large, but suppose that are 100 different levels in qs, not necessary following a pattern. The column vector vr - which I as said is also a categorical variable - has some missing values.

What I want to do is label the missing values that exist in vr according to the corresponding category or value in qs.

I know a priori that are 9 different categories in qs for which vr has missing values. Say, that the label for one category in qs is 102and for this category in qs there are missing values in vr.

So, what I want then to do is

if the category/label/value in qs==102 set vr==Greece
if the category/label/value in qs==250 set vr==Italy

end so on.

Sadly, my dataframe is very complicated and I don't know how I can reproduce a simple dataframe.

akrun · Accepted Answer

Assuming that there is a 'c' in 'vr' for 'qs' value of 3, we can use data.table

library(data.table)
setDT(df)[, vr := na.omit(vr)[1] , by = qs]

It is not clear whether the OP wanted to replace the missing values with the unique elements for 'vr' for each 'qs' or from some other values. If it is to replace with some other values, create a key/value dataset and join with the original dataset on 'qs'

df1 <- data.table(qs = 1:4, vr = c("Serbia", "England", "Greece", "USA"))
df$qs <- as.numeric(as.character(df$qs))
setDT(df)[df1, on = "qs"][is.na(vr), vr := i.vr][, i.vr := NULL][]

Replacing missing values of a column vector

Answers (1)

Related Questions