Reputation: 1571
In a dataframe I have two categorical variables, say qs
and vr
of a dataframe df
. This dataframe is quite large, but suppose that are 100 different levels in qs
, not necessary following a pattern. The column vector vr
- which I as said is also a categorical variable - has some missing values.
What I want to do is label the missing values that exist in vr
according to the corresponding category or value in qs
.
I know a priori that are 9 different categories in qs
for which vr
has missing values. Say, that the label for one category in qs
is 102
and for this category in qs
there are missing values in vr
.
So, what I want then to do is
Greece
Italy
end so on.
Sadly, my dataframe is very complicated and I don't know how I can reproduce a simple dataframe.
Upvotes: 1
Views: 226
Reputation: 887118
Assuming that there is a 'c' in 'vr' for 'qs' value of 3, we can use data.table
library(data.table)
setDT(df)[, vr := na.omit(vr)[1] , by = qs]
It is not clear whether the OP wanted to replace the missing values with the unique elements for 'vr' for each 'qs' or from some other values. If it is to replace with some other values, create a key/value dataset and join with the original dataset on
'qs'
df1 <- data.table(qs = 1:4, vr = c("Serbia", "England", "Greece", "USA"))
df$qs <- as.numeric(as.character(df$qs))
setDT(df)[df1, on = "qs"][is.na(vr), vr := i.vr][, i.vr := NULL][]
Upvotes: 1