Map one Dataframe to a second Dataframe

Question

I have two dataframes and want to map both and give a binary value 1 if present else 0.

1st DF

id       1_1   1_2   1_3   1_4   1_5   1_6   1_7   1_8   1_9   1_10  1_freq
111.txt  NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
112.txt  NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
113.txt  NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
114.txt  NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
115.txt  NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
116.txt  NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA

2nd DF

id                 cats
111.cats           1,7,1
112.cats           1,1,2|1,3,2
113.cats           1,10,1|1,6,2
114.cats           1,4,2
115.cats           1,5,1
116.cats           1,1,2|1,8,1

In 2nd DF$cats first row have 1,7,1 where 1 and 7 combines and makes 1_7 column and on this column binary value 1 is placed and on remaining column 0 is placed and last 1 number goes to the 1_freq column, and if any row has more than 1 category like this 1,10,1|1,6,2 where 1,10,1 goes to 1_10 column and 1,6,2 goes to 1_6 column and the frequency of both category sums up and goes to 1_freq column.

DF Should look like this

id       1_1   1_2   1_3   1_4   1_5   1_6   1_7   1_8   1_9   1_10  1_freq
111.txt  0     0     0     0     0     0     1     0     0     0     1
112.txt  1     0     1     0     0     0     0     0     0     0     4
113.txt  0     0     0     0     0     1     0     0     0     1     3
114.txt  0     0     0     1     0     0     0     0     0     0     2
115.txt  0     0     0     0     1     0     0     0     0     0     1
116.txt  1     0     0     0     0     0     0     1     0     0     3

Hope question is clear. Thankyou

akrun · Accepted Answer

Here is an option using tidyverse. We expand the rows of the dataset by splitting at | of the 'cats' column, then separate the 'cats' into two column by splitting at the last , , grouped by 'id', get the sum of the 'freq' column, extract the numbers at the end of the 'cats', convert it to factor with levels specified, create a column of 1s ('val'), spread it to 'wide' format

library(tidyverse)
o1 <- df2 %>% 
       separate_rows(cats, sep = "[|]") %>% 
       separate(cats, into = c('cats', 'freq'), 
           sep=",(?=[^,]+$)", convert = TRUE) %>%
       group_by(id) %>%
       mutate(freq = sum(freq), 
              cats = factor(str_extract(cats, "\d+$"), levels = 1:10), 
              val = 1)  %>% 
       spread(cats, val, fill = 0) %>% 
       rename_at(-1, ~ paste0('1_', .))

Now, we assign values for the columns that are common with the initial dataset ('df1')

df1[is.na(df1)] <- 0
df1[names(o1)[-1]] <- o1[-1]
df1
#       id 1_1 1_2 1_3 1_4 1_5 1_6 1_7 1_8 1_9 1_10 1_freq
#1 111.txt   0   0   0   0   0   0   1   0   0    0      1
#2 112.txt   1   0   1   0   0   0   0   0   0    0      4
#3 113.txt   0   0   0   0   0   1   0   0   0    1      3
#4 114.txt   0   0   0   1   0   0   0   0   0    0      2
#5 115.txt   0   0   0   0   1   0   0   0   0    0      1
#6 116.txt   1   0   0   0   0   0   0   1   0    0      3

data

df1 <- structure(list(id = c("111.txt", "112.txt", "113.txt", "114.txt", 
"115.txt", "116.txt"), `1_1` = c(NA, NA, NA, NA, NA, NA), `1_2` = c(NA, 
NA, NA, NA, NA, NA), `1_3` = c(NA, NA, NA, NA, NA, NA), `1_4` = c(NA, 
NA, NA, NA, NA, NA), `1_5` = c(NA, NA, NA, NA, NA, NA), `1_6` = c(NA, 
NA, NA, NA, NA, NA), `1_7` = c(NA, NA, NA, NA, NA, NA), `1_8` = c(NA, 
NA, NA, NA, NA, NA), `1_9` = c(NA, NA, NA, NA, NA, NA), `1_10` = c(NA, 
NA, NA, NA, NA, NA), `1_freq` = c(NA, NA, NA, NA, NA, NA)),
    class = "data.frame", row.names = c(NA, 
-6L))

df2 <- structure(list(id = c("111.cats", "112.cats", "113.cats", "114.cats", 
"115.cats", "116.cats"), cats = c("1,7,1", "1,1,2|1,3,2", "1,10,1|1,6,2", 
"1,4,2", "1,5,1", "1,1,2|1,8,1")), class = "data.frame", row.names = c(NA, 
-6L))

Map one Dataframe to a second Dataframe

Answers (2)

Explanation

Data

data

Related Questions