Reputation: 83
I'm using R and have the following sample of data frame in which all variables are factors:
first second third
social birth control high
birth control high
medical Anorexia Nervosa low
medical Anorexia Nervosa low
Alcoholism high
family Alcoholism high
Basically, I need a function to help me fill the blanks in the first column based upon the values in the second and third columns. For instance, if I have in the second column "birth control" and in the third column "high" I need to fill the blank in the first column with "social". If it is "Alcoholism" and "high" in the second and third column respectively, I need to fill the blanks in the first column with "family".
Upvotes: 3
Views: 3564
Reputation: 887851
Based on the data showed, it is not very clear whether you have other values in 'first' for each combination of 'second' and 'third'. If there is only a single value and you need to replace the ''
with that, then you could try
library(data.table)
setDT(df1)[, replace(first, first=='', first[first!='']),
list(second, third)]
Or a more efficient method would be
setDT(df1)[, first:= first[first!=''] , list(second, third)]
# first second third
#1: social birth control high
#2: social birth control high
#3: medical Anorexia Nervosa low
#4: medical Anorexia Nervosa low
#5: family Alcoholism high
#6: family Alcoholism high
df1 <- structure(list(first = c("social", "", "medical", "medical",
"", "family"), second = c("birth control", "birth control",
"Anorexia Nervosa",
"Anorexia Nervosa", "Alcoholism", "Alcoholism"), third = c("high",
"high", "low", "low", "high", "high")), .Names = c("first", "second",
"third"), class = "data.frame", row.names = c(NA, -6L))
Upvotes: 3
Reputation: 5951
Another approach with dplyr
using @akrun very nice solution
library(dplyr)
df1 %>% group_by(second, third) %>%
mutate(first=replace(first, first=='', first[first!=''])) %>% ungroup
Data
df1 <- structure(list(first = c("social", "", "medical", "medical",
"", "family"), second = c("birth control", "birth control",
"Anorexia Nervosa",
"Anorexia Nervosa", "Alcoholism", "Alcoholism"), third = c("high",
"high", "low", "low", "high", "high")), .Names = c("first", "second",
"third"), class = "data.frame", row.names = c(NA, -6L))
Upvotes: 1
Reputation: 193677
One way would be to create a lookup list of some sort (for example, either using a named vector, factor
or something similar) and then replacing any ""
values with the values from the lookup list.
Here's an example (though I think that your problem is not fully defined and perhaps overly simplified).
library(dplyr)
library(tidyr)
mydf %>%
unite(condition, second, third, remove = FALSE) %>%
mutate(condition = factor(condition,
c("birth control_high", "Anorexia Nervosa_low", "Alcoholism_high"),
c("social", "medical", "family"))) %>%
mutate(condition = as.character(condition)) %>%
mutate(first = replace(first, first == "", condition[first == ""])) %>%
select(-condition)
# first second third
# 1 social birth control high
# 2 social birth control high
# 3 medical Anorexia Nervosa low
# 4 medical Anorexia Nervosa low
# 5 family Alcoholism high
# 6 family Alcoholism high
A "data.table" approach would follow the same steps, but would have the advantage of modifying by reference rather than copying.
library(data.table)
as.data.table(mydf)[
, condition := sprintf("%s_%s", second, third)][
, condition := as.character(
factor(condition,
c("birth control_high", "Anorexia Nervosa_low", "Alcoholism_high"),
c("social", "medical", "family")))][
first == "", first := condition][
, condition := NULL][]
Upvotes: 2