Learning
Learning

Reputation: 29

How to get new logical column comparing 2 dataframe in r

So I have two datasets:

I would like to compare these two dataset and create new logical column(True, False or 1, 0) to first one with 42 columns if any values from first dataset exist in second dataset.

I have tried to use %in% then there are so many combination I should use for example

First_dataset$log_1 <- First_dataset$first_key %in% second_dataset$key_first

First_dataset$log_2 <- First_dataset$first_key %in% second_dataset$key_second

First_dataset$log_3 <- First_dataset$first_key %in% second_dataset$key_third

First_dataset$log_4 <- First_dataset$second_key %in% second_dataset$key_first

First_dataset$log_5 <- First_dataset$second_key %in% second_dataset$key_second

First_dataset$log_6 <- First_dataset$second_key %in% second_dataset$key_third

until 42 columns...

Is there any simpler method(or function) I can employ to get new logical columns with less lines of code?

Upvotes: 0

Views: 44

Answers (1)

Vincent
Vincent

Reputation: 309

Giving the number of observations in your datasets, I doubt the following solution would be suitable, but I assume this would work for smaller datasets.

The trick is to reshape your data, that is currently stored in a wide format, to a long format, see ?reshape. You can then perform your comparison.

If you want to go back to a wide format, you can do so, adding as much columns as your original first dataset.

Here is a little reproducible example.

# Your dataset with 42 columns
df1 <- data.frame(id = 1:5,
                  col1 = c("a", "b", "c", "d", "e"),
                  col2 = c("f", "g", "h", "i", "j"),
                  col3 = c("k", "l", "m", "n", "o"))

# Your dataset with 4 columns
df2 <- data.frame(id = 1:3,
                  col1 = c("a", "b", "e"),
                  col2 = c("g", "h", "m"))

# Reshape your datasets to long format
df1_long <- reshape(df1,
                    varying = list(paste0("col", 1:3)),
                    direction = "long",
                    idvar = "id",
                    v.names = "col")

df2_long <- reshape(df2,
                    varying = list(paste0("col", 1:2)),
                    direction = "long",
                    idvar = "id",
                    v.names = "col")

# Perform your comparison
df1_long$log <- df1_long$col %in% df2_long$col

# Go back to your initial wide format
df1_wide <- reshape(df1_long,
                    direction = "wide",
                    timevar = "time",
                    v.names = c("col", "log"),
                    idvar = "id",
                    sep = "")

Upvotes: 1

Related Questions