Reputation: 83

R - Filling missing values (blanks) based upon values on the same row but different column

I'm using R and have the following sample of data frame in which all variables are factors:

  first            second  third
 social     birth control   high
            birth control   high
medical  Anorexia Nervosa    low
medical  Anorexia Nervosa    low
               Alcoholism   high
 family        Alcoholism   high

Basically, I need a function to help me fill the blanks in the first column based upon the values in the second and third columns. For instance, if I have in the second column "birth control" and in the third column "high" I need to fill the blank in the first column with "social". If it is "Alcoholism" and "high" in the second and third column respectively, I need to fill the blanks in the first column with "family".

Upvotes: 3

Answers (3)

akrun

Reputation: 887851

Based on the data showed, it is not very clear whether you have other values in 'first' for each combination of 'second' and 'third'. If there is only a single value and you need to replace the '' with that, then you could try

library(data.table)
setDT(df1)[, replace(first, first=='', first[first!='']),
                                         list(second, third)]

Or a more efficient method would be

setDT(df1)[, first:= first[first!=''] , list(second, third)]
#     first           second third
#1:  social    birth control  high
#2:  social    birth control  high
#3: medical Anorexia Nervosa   low
#4: medical Anorexia Nervosa   low
#5:  family       Alcoholism  high
#6:  family       Alcoholism  high

data

df1 <- structure(list(first = c("social", "", "medical", "medical", 
"", "family"), second = c("birth control", "birth control", 
"Anorexia Nervosa", 
"Anorexia Nervosa", "Alcoholism", "Alcoholism"), third = c("high", 
"high", "low", "low", "high", "high")), .Names = c("first", "second", 
"third"), class = "data.frame", row.names = c(NA, -6L))

Upvotes: 3

dimitris_ps

Reputation: 5951

Another approach with dplyr using @akrun very nice solution

library(dplyr)

df1 %>% group_by(second, third) %>% 
  mutate(first=replace(first, first=='', first[first!=''])) %>% ungroup

Data

df1 <- structure(list(first = c("social", "", "medical", "medical", 
"", "family"), second = c("birth control", "birth control", 
"Anorexia Nervosa", 
"Anorexia Nervosa", "Alcoholism", "Alcoholism"), third = c("high", 
"high", "low", "low", "high", "high")), .Names = c("first", "second", 
"third"), class = "data.frame", row.names = c(NA, -6L))

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1

Reputation: 193677

One way would be to create a lookup list of some sort (for example, either using a named vector, factor or something similar) and then replacing any "" values with the values from the lookup list.

Here's an example (though I think that your problem is not fully defined and perhaps overly simplified).

library(dplyr)
library(tidyr)

mydf %>%
  unite(condition, second, third, remove = FALSE) %>%
  mutate(condition = factor(condition, 
                            c("birth control_high", "Anorexia Nervosa_low", "Alcoholism_high"),
                            c("social", "medical", "family"))) %>%
  mutate(condition = as.character(condition)) %>%
  mutate(first = replace(first, first == "", condition[first == ""])) %>%
  select(-condition)
#     first           second third
# 1  social    birth control  high
# 2  social    birth control  high
# 3 medical Anorexia Nervosa   low
# 4 medical Anorexia Nervosa   low
# 5  family       Alcoholism  high
# 6  family       Alcoholism  high

A "data.table" approach would follow the same steps, but would have the advantage of modifying by reference rather than copying.

library(data.table)
as.data.table(mydf)[
  , condition := sprintf("%s_%s", second, third)][
    , condition := as.character(
      factor(condition, 
             c("birth control_high", "Anorexia Nervosa_low", "Alcoholism_high"),
             c("social", "medical", "family")))][
               first == "", first := condition][
                 , condition := NULL][]

Upvotes: 2

R - Filling missing values (blanks) based upon values on the same row but different column

Answers (3)

data

Related Questions