Jaume
Jaume

Reputation: 189

How to recode multiple columns of one dataframe based on a column from another dataframe?

I want to create new columns in df1 based on two columns in df2. df1 has multiple columns ("x", "y" , "z"), with possible values 0,1, or NA. df2 has two columns: "a" and "b". The values of column "b" include the column names of df1 (although some values in df2$a may be present as columns in df1).

x <- c(1, 1, NA, NA, 0)
y <- c(0, 1, 1, NA, NA)
z <- c(1, 1, 0, 1, 1)

df1 <- data.frame(x, y, z)

a <- c( "Green", "Green", "Green",  "Red", "Red",   "Blue", "Blue", "Yellow")
b <- c(  "w",       "x",    "y",     "x",    "z",     "w" ,   "y",    "z"   )

df2 <- data.frame(a, b)

The new columns to be created in df1 should be named as the values of column df2$a. The value of each new column (i.e df1$Green) should be "1" when

The new_df1 should be:

new_df1

   x  y z Green Red Blue Yellow
1  1  0 1     1   1    0      1
2  1  1 1     1   1    1      1
3 NA  1 0     1   0    1      0
4 NA NA 1     0   1    0      1
5  0 NA 1     0   1    0      1

Upvotes: 1

Views: 123

Answers (3)

akrun
akrun

Reputation: 887223

An option in tidyverse

library(dplyr)
library(tidyr)
df2 %>% 
  filter(b %in% names(df1)) %>% 
  pivot_wider(names_from = a, values_from = b, 
    values_fn = function(x) list(+(rowSums(df1[x], na.rm  = TRUE) > 0))) %>%
  unnest(everything()) %>%
  bind_cols(df1, .)

-output

    x  y z Green Red Blue Yellow
1  1  0 1     1   1    0      1
2  1  1 1     1   1    1      1
3 NA  1 0     1   0    1      0
4 NA NA 1     0   1    0      1
5  0 NA 1     0   1    0      1

Upvotes: 2

Ma&#235;l
Ma&#235;l

Reputation: 52069

With split + rowSums:

df3 <- 
  split(df2, df2$a) |>
  sapply(\(x) +(rowSums(df1[intersect(colnames(df1), x$b)], na.rm = TRUE) > 0))

> cbind.data.frame(df1, df3)
#    x  y z Blue Green Red Yellow
# 1  1  0 1    0     1   1      1
# 2  1  1 1    1     1   1      1
# 3 NA  1 0    1     1   0      0
# 4 NA NA 1    0     0   1      1
# 5  0 NA 1    0     0   1      1

Upvotes: 2

Ric
Ric

Reputation: 5722

In pure base R:


x <- c(1, 1, NA, NA, 0)
y <- c(0, 1, 1, NA, NA)
z <- c(1, 1, 0, 1, 1)

df1 <- data.frame(x, y, z)

a <- c( "Green", "Green", "Green",  "Red", "Red",   "Blue", "Blue", "Yellow")
b <- c(  "w",       "x",    "y",     "x",    "z",     "w" ,   "y",    "z"   )
df2 <- data.frame(a, b)

# table of counts of colors/letters
d <- as.data.frame(t(unclass(table(df2))))

cols <- (t(apply(df1, 1, \(x) apply(d[colnames(df1)[as.logical(x)],],2,\(x) as.numeric(any(x))))))

#if you do not want NA's
cols[is.na(cols)] <- 0

# your desired order (can also use cbind)
df1[,c("Green", "Red","Blue","Yellow")] <- cols[,c("Green", "Red","Blue","Yellow")]

df1
#>    x  y z Green Red Blue Yellow
#> 1  1  0 1     1   1    0      1
#> 2  1  1 1     1   1    1      1
#> 3 NA  1 0     1   0    1      0
#> 4 NA NA 1     0   1    0      1
#> 5  0 NA 1     0   1    0      1

d is a table of count of colors in columns, letters as rownames. First apply searches for every row in df1 the rows -letters- in d that are 1 in df1. Second apply uses any to find in d ocurrence of any "1" for every color column. Note that \(x) is shorthand for function(x)

Upvotes: 3

Related Questions