Joe
Joe

Reputation: 1768

Check if column name occurs in a specific data frame in a list of data frames

I hava a data frame like this:

df <- data.frame(c(1, 2), NA, NA, NA)
colnames(df) <- c("id", "2017-01-01", "2017-02-01", "2017-03-01")

And a list of data frames like this:

id_list <- list(data.frame(id = c(1, 1), date = c("2017-03-01", "2017-01-01")),
                data.frame(id = c(2, 2), date = c("2017-02-01", "2017-03-01")))

My goal is to fill the date columns of df with 0s and 1s depending on whether or not in id_list a date occurs in the data frame of an id. Hence, the final output should be:

> df_final
  id 2017-01-01 2017-02-01 2017-03-01
1  1          1          0          1
2  2          0          1          1

In reality, df has 170 columns and 2400 rows; id_list has 2400 data frames each with 1 - 100 rows and 20 columns. I should stress that the data frames in id_list are not sorted by date.

EDIT: I just tried LAP's solution for:

df <- data.frame(c(1, 2), 0, 0, 0, 0)
colnames(df) <- c("id", "2017-01-01", "2017-02-01", "2017-03-01", "2017-04-01")
id_list <- list(data.frame(id = c(1, 1),
                           date = c("2017-03-01", "2017-01-01"),
                           stringsAsFactors = F),
                data.frame(id = c(2, 2, 2),
                           date = c("2017-02-01", "2017-03-01", "2017-04-1"),
                           stringsAsFactors = F))

Unfortunately, the output was

> df
  id 2017-01-01 2017-02-01 2017-03-01 2017-04-01
1  1          1          0          1          0
2  2          0          1          1          0

instead of

> df
  id 2017-01-01 2017-02-01 2017-03-01 2017-04-01
1  1          1          0          1          0
2  2          0          1          1          1

EDIT2: I had a bad typo 2017-04-1 instead of 2017-04-01

Upvotes: 0

Views: 663

Answers (2)

akrun
akrun

Reputation: 887118

Another option would be to rbind the 'id_list' and then use a row/column indexing method to assign the 1 values. If the other values should be 0, then it is better to construct with a 0 instead of NA

d1 <- do.call(rbind, id_list)
i1 <- cbind(match(d1$id, df$id), match(d1$date, names(df)[-1], nomatch = 0))
df[-1][i1] <- 1
df
#   id 2017-01-01 2017-02-01 2017-03-01
#1  1          1          0          1
#2  2          0          1          1

data

df <- data.frame(c(1, 2), 0, 0, 0)
colnames(df) <- c("id", "2017-01-01", "2017-02-01", "2017-03-01")

Upvotes: 1

LAP
LAP

Reputation: 6685

You could use a for loop over the columns while simultaneously using the column name as input for an sapply call to loop through id_list and check for the occurence of said name within the dataframes:

for(i in names(df)[-1]){
  df[, i] <- as.numeric(sapply(id_list, function(x) i %in% x[, "date"]))
}

> df
  id 2017-01-01 2017-02-01 2017-03-01
1  1          1          0          1
2  2          0          1          1

Upvotes: 2

Related Questions