Check if column name occurs in a specific data frame in a list of data frames

Question

I hava a data frame like this:

df <- data.frame(c(1, 2), NA, NA, NA)
colnames(df) <- c("id", "2017-01-01", "2017-02-01", "2017-03-01")

And a list of data frames like this:

id_list <- list(data.frame(id = c(1, 1), date = c("2017-03-01", "2017-01-01")),
                data.frame(id = c(2, 2), date = c("2017-02-01", "2017-03-01")))

My goal is to fill the date columns of df with 0s and 1s depending on whether or not in id_list a date occurs in the data frame of an id. Hence, the final output should be:

> df_final
  id 2017-01-01 2017-02-01 2017-03-01
1  1          1          0          1
2  2          0          1          1

In reality, df has 170 columns and 2400 rows; id_list has 2400 data frames each with 1 - 100 rows and 20 columns. I should stress that the data frames in id_list are not sorted by date.

EDIT: I just tried LAP's solution for:

df <- data.frame(c(1, 2), 0, 0, 0, 0)
colnames(df) <- c("id", "2017-01-01", "2017-02-01", "2017-03-01", "2017-04-01")
id_list <- list(data.frame(id = c(1, 1),
                           date = c("2017-03-01", "2017-01-01"),
                           stringsAsFactors = F),
                data.frame(id = c(2, 2, 2),
                           date = c("2017-02-01", "2017-03-01", "2017-04-1"),
                           stringsAsFactors = F))

Unfortunately, the output was

> df
  id 2017-01-01 2017-02-01 2017-03-01 2017-04-01
1  1          1          0          1          0
2  2          0          1          1          0

instead of

> df
  id 2017-01-01 2017-02-01 2017-03-01 2017-04-01
1  1          1          0          1          0
2  2          0          1          1          1

EDIT2: I had a bad typo 2017-04-1 instead of 2017-04-01

LAP · Accepted Answer

You could use a for loop over the columns while simultaneously using the column name as input for an sapply call to loop through id_list and check for the occurence of said name within the dataframes:

for(i in names(df)[-1]){
  df[, i] <- as.numeric(sapply(id_list, function(x) i %in% x[, "date"]))
}

> df
  id 2017-01-01 2017-02-01 2017-03-01
1  1          1          0          1
2  2          0          1          1

Check if column name occurs in a specific data frame in a list of data frames

Answers (2)

data

Related Questions