Reputation: 39595
I am working with two dataframes in R and I am trying to build new variables after comparing each variable with a dataframe in order to bring a specific value. My dataframes are DF
and key
and they look like these (I add the dput
version in the final part):
DF
ID V1 V2 V3 V4 V5
1 001 Y Zone1 Level0 M1 T
2 002 N Zone2 Level1 M2 O
3 003 N Zone3 Level0 M3 NR
4 004 Y NR Level0 M4 O
5 005 NR NR Level0 M4 O
6 006 Y NR Level1 M4 T
7 007 Y Zone3 Level1 M1 T
8 008 N Zone2 Level1 M1 T
9 009 NR Zone2 Level0 M1 T
DF
has and ID variable and five variables. The other key
has the next form:
key
Name Value
1 Y 15
2 N 20
3 Zone1 34
4 Zone2 1
5 Zone3 2
6 Level0 9
7 Level1 8
8 M1 4
9 M2 5
10 M3 6
11 M4 8
12 M5 9
13 T 2
14 O 5
This dataframe has two variables Name
and Value
. So, I want to compare each of the variables in DF
(V1,V2,V3,V4,V5
) with Name
variable from key
in order to create new variables that save the values of Value
after their respective comparison. The new variables must have the next name structure: VI_Name.of.variable.in.DF_M
, for that reason I have build this code:
DF$VI_V1_M=key[match(DF$V1,key$Name),"Value"]
DF$VI_V2_M=key[match(DF$V2,key$Name),"Value"]
DF$VI_V3_M=key[match(DF$V3,key$Name),"Value"]
DF$VI_V4_M=key[match(DF$V4,key$Name),"Value"]
DF$VI_V5_M=key[match(DF$V5,key$Name),"Value"]
With these lines, I compare each variable in DF
with Name
in key
to bring Value
, but I don't know how to create a for structure or a function with apply in order to build the required variables because the number of variables in DF
can increase. I would like to get something like this:
ID V1 V2 V3 V4 V5 VI_V1_M VI_V2_M VI_V3_M VI_V4_M VI_V5_M
1 001 Y Zone1 Level0 M1 T 15 34 9 4 2
2 002 N Zone2 Level1 M2 O 20 1 8 5 5
3 003 N Zone3 Level0 M3 NR 20 2 9 6 NA
4 004 Y NR Level0 M4 O 15 NA 9 8 5
5 005 NR NR Level0 M4 O NA NA 9 8 5
6 006 Y NR Level1 M4 T 15 NA 8 8 2
7 007 Y Zone3 Level1 M1 T 15 2 8 4 2
8 008 N Zone2 Level1 M1 T 20 1 8 4 2
9 009 NR Zone2 Level0 M1 T NA 1 9 4 2
Where the results of comparing each variables are saved in the new variables. I am looking for a solution to reduce the number of my lines because in case of 100 variables in DF
I would have to write 100 lines of code. The dput()
version of DF
and key
is the next:
DF<-structure(list(ID = c("001", "002", "003", "004", "005", "006",
"007", "008", "009"), V1 = c("Y", "N", "N", "Y", "NR", "Y", "Y",
"N", "NR"), V2 = c("Zone1", "Zone2", "Zone3", "NR", "NR", "NR",
"Zone3", "Zone2", "Zone2"), V3 = c("Level0", "Level1", "Level0",
"Level0", "Level0", "Level1", "Level1", "Level1", "Level0"),
V4 = c("M1", "M2", "M3", "M4", "M4", "M4", "M1", "M1", "M1"
), V5 = c("T", "O", "NR", "O", "O", "T", "T", "T", "T")), .Names = c("ID",
"V1", "V2", "V3", "V4", "V5"), row.names = c(NA, -9L), class = "data.frame")
key<-structure(list(Name = c("Y", "N", "Zone1", "Zone2", "Zone3",
"Level0", "Level1", "M1", "M2", "M3", "M4", "M5", "T", "O"),
Value = c(15, 20, 34, 1, 2, 9, 8, 4, 5, 6, 8, 9, 2, 5)), .Names = c("Name",
"Value"), row.names = c(NA, -14L), class = "data.frame")
Upvotes: 2
Views: 162
Reputation: 36076
You could tackle this as a "reshape-merge-reshape-merge" problem, although there may be a simpler approach.
If you reshaped DF
to a long format, you could merge this by Name
with key
to get all your values in one dataset. Then you could remove the Name
variable from the dataset, change the values V1
-V5
(now in a column) to match your naming structure, and then reshape the dataset back to a wide format with the former column Value
as the column values. One more merge with the original DF
gets it into the final format you wanted. It seems like there will be a way to do thsi with less steps, but I haven't thought of anything yet.
I used package tidyr for reshaping and dplyr for manipulation.
library(tidyr)
library(dplyr)
DF %>%
gather(Variable, Name, V1:V5) %>% # Put DF in long format
inner_join(key, by = "Name") %>% # Join with "key" dataset
select(-Name) %>% # Remove "Name" now so doesn't interfere with spread()
mutate(Variable = paste("VI", Variable, "M", sep = "_")) %>% # Make names for columns
spread(Variable, Value) %>% # Put dataset back into wide format
inner_join(DF, ., by = "ID") # Join with original DF
ID V1 V2 V3 V4 V5 VI_V1_M VI_V2_M VI_V3_M VI_V4_M VI_V5_M
1 001 Y Zone1 Level0 M1 T 15 34 9 4 2
2 002 N Zone2 Level1 M2 O 20 1 8 5 5
3 003 N Zone3 Level0 M3 NR 20 2 9 6 NA
4 004 Y NR Level0 M4 O 15 NA 9 8 5
5 005 NR NR Level0 M4 O NA NA 9 8 5
6 006 Y NR Level1 M4 T 15 NA 8 8 2
7 007 Y Zone3 Level1 M1 T 15 2 8 4 2
8 008 N Zone2 Level1 M1 T 20 1 8 4 2
9 009 NR Zone2 Level0 M1 T NA 1 9 4 2
Upvotes: 1