Duck
Duck

Reputation: 39595

Create variables in a dataframe in R after comparing one by one with another dataframe and with specific name

I am working with two dataframes in R and I am trying to build new variables after comparing each variable with a dataframe in order to bring a specific value. My dataframes are DF and key and they look like these (I add the dput version in the final part):

DF

   ID V1    V2     V3 V4 V5
1 001  Y Zone1 Level0 M1  T
2 002  N Zone2 Level1 M2  O
3 003  N Zone3 Level0 M3 NR
4 004  Y    NR Level0 M4  O
5 005 NR    NR Level0 M4  O
6 006  Y    NR Level1 M4  T
7 007  Y Zone3 Level1 M1  T
8 008  N Zone2 Level1 M1  T
9 009 NR Zone2 Level0 M1  T 

DF has and ID variable and five variables. The other key has the next form:

key
     Name Value
1       Y    15
2       N    20
3   Zone1    34
4   Zone2     1
5   Zone3     2
6  Level0     9
7  Level1     8
8      M1     4
9      M2     5
10     M3     6
11     M4     8
12     M5     9
13      T     2
14      O     5

This dataframe has two variables Name and Value. So, I want to compare each of the variables in DF (V1,V2,V3,V4,V5) with Name variable from key in order to create new variables that save the values of Value after their respective comparison. The new variables must have the next name structure: VI_Name.of.variable.in.DF_M, for that reason I have build this code:

DF$VI_V1_M=key[match(DF$V1,key$Name),"Value"]
DF$VI_V2_M=key[match(DF$V2,key$Name),"Value"]
DF$VI_V3_M=key[match(DF$V3,key$Name),"Value"]
DF$VI_V4_M=key[match(DF$V4,key$Name),"Value"]
DF$VI_V5_M=key[match(DF$V5,key$Name),"Value"]

With these lines, I compare each variable in DF with Name in key to bring Value, but I don't know how to create a for structure or a function with apply in order to build the required variables because the number of variables in DF can increase. I would like to get something like this:

   ID V1    V2     V3 V4 V5 VI_V1_M VI_V2_M VI_V3_M VI_V4_M VI_V5_M
1 001  Y Zone1 Level0 M1  T      15      34       9       4       2
2 002  N Zone2 Level1 M2  O      20       1       8       5       5
3 003  N Zone3 Level0 M3 NR      20       2       9       6      NA
4 004  Y    NR Level0 M4  O      15      NA       9       8       5
5 005 NR    NR Level0 M4  O      NA      NA       9       8       5
6 006  Y    NR Level1 M4  T      15      NA       8       8       2
7 007  Y Zone3 Level1 M1  T      15       2       8       4       2
8 008  N Zone2 Level1 M1  T      20       1       8       4       2
9 009 NR Zone2 Level0 M1  T      NA       1       9       4       2

Where the results of comparing each variables are saved in the new variables. I am looking for a solution to reduce the number of my lines because in case of 100 variables in DF I would have to write 100 lines of code. The dput() version of DF and key is the next:

DF<-structure(list(ID = c("001", "002", "003", "004", "005", "006", 
"007", "008", "009"), V1 = c("Y", "N", "N", "Y", "NR", "Y", "Y", 
"N", "NR"), V2 = c("Zone1", "Zone2", "Zone3", "NR", "NR", "NR", 
"Zone3", "Zone2", "Zone2"), V3 = c("Level0", "Level1", "Level0", 
"Level0", "Level0", "Level1", "Level1", "Level1", "Level0"), 
    V4 = c("M1", "M2", "M3", "M4", "M4", "M4", "M1", "M1", "M1"
    ), V5 = c("T", "O", "NR", "O", "O", "T", "T", "T", "T")), .Names = c("ID", 
"V1", "V2", "V3", "V4", "V5"), row.names = c(NA, -9L), class = "data.frame")

key<-structure(list(Name = c("Y", "N", "Zone1", "Zone2", "Zone3", 
"Level0", "Level1", "M1", "M2", "M3", "M4", "M5", "T", "O"), 
    Value = c(15, 20, 34, 1, 2, 9, 8, 4, 5, 6, 8, 9, 2, 5)), .Names = c("Name", 
"Value"), row.names = c(NA, -14L), class = "data.frame")

Upvotes: 2

Views: 162

Answers (1)

aosmith
aosmith

Reputation: 36076

You could tackle this as a "reshape-merge-reshape-merge" problem, although there may be a simpler approach.

If you reshaped DF to a long format, you could merge this by Name with key to get all your values in one dataset. Then you could remove the Name variable from the dataset, change the values V1-V5 (now in a column) to match your naming structure, and then reshape the dataset back to a wide format with the former column Value as the column values. One more merge with the original DF gets it into the final format you wanted. It seems like there will be a way to do thsi with less steps, but I haven't thought of anything yet.

I used package tidyr for reshaping and dplyr for manipulation.

library(tidyr)
library(dplyr)

DF %>%
    gather(Variable, Name, V1:V5) %>% # Put DF in long format
    inner_join(key, by = "Name") %>% # Join with "key" dataset
    select(-Name) %>% # Remove "Name" now so doesn't interfere with spread()
    mutate(Variable = paste("VI", Variable, "M", sep = "_")) %>% # Make names for columns
    spread(Variable, Value) %>% # Put dataset back into wide format
    inner_join(DF, ., by = "ID") # Join with original DF

   ID V1    V2     V3 V4 V5 VI_V1_M VI_V2_M VI_V3_M VI_V4_M VI_V5_M
1 001  Y Zone1 Level0 M1  T      15      34       9       4       2
2 002  N Zone2 Level1 M2  O      20       1       8       5       5
3 003  N Zone3 Level0 M3 NR      20       2       9       6      NA
4 004  Y    NR Level0 M4  O      15      NA       9       8       5
5 005 NR    NR Level0 M4  O      NA      NA       9       8       5
6 006  Y    NR Level1 M4  T      15      NA       8       8       2
7 007  Y Zone3 Level1 M1  T      15       2       8       4       2
8 008  N Zone2 Level1 M1  T      20       1       8       4       2
9 009 NR Zone2 Level0 M1  T      NA       1       9       4       2

Upvotes: 1

Related Questions