msh855
msh855

Reputation: 1571

Replacing missing values of a column vector

In a dataframe I have two categorical variables, say qs and vr of a dataframe df. This dataframe is quite large, but suppose that are 100 different levels in qs, not necessary following a pattern. The column vector vr - which I as said is also a categorical variable - has some missing values.

What I want to do is label the missing values that exist in vr according to the corresponding category or value in qs.

I know a priori that are 9 different categories in qs for which vr has missing values. Say, that the label for one category in qs is 102and for this category in qs there are missing values in vr.

So, what I want then to do is

end so on.

Sadly, my dataframe is very complicated and I don't know how I can reproduce a simple dataframe.

Upvotes: 1

Views: 226

Answers (1)

akrun
akrun

Reputation: 887118

Assuming that there is a 'c' in 'vr' for 'qs' value of 3, we can use data.table

library(data.table)
setDT(df)[, vr := na.omit(vr)[1] , by = qs]

It is not clear whether the OP wanted to replace the missing values with the unique elements for 'vr' for each 'qs' or from some other values. If it is to replace with some other values, create a key/value dataset and join with the original dataset on 'qs'

df1 <- data.table(qs = 1:4, vr = c("Serbia", "England", "Greece", "USA"))
df$qs <- as.numeric(as.character(df$qs))
setDT(df)[df1, on = "qs"][is.na(vr), vr := i.vr][, i.vr := NULL][]

Upvotes: 1

Related Questions