UPDATE Below :Original I'm trying to find the most elegant (simple and concise) way of replacing the value of certain columns based on matching two columns from another data frame. Here is the table with the columns I wish to replace (based on what values they contain). > cost.table Identifier Phase.0.Difficulty Phase.1.Complexity Phase.2.Complexity Phase.3.Complexity Phase.4.Complexity Phase.5.Complexity 1 FS1 Low Low Low Medium Medium High 2 FS2 High High High Medium Medium Medium 3 FS3 High Low Low High High High 4 FS4 High Medium Medium Medium Medium Medium 5 FS5 High Medium Medium High Medium Medium Phase.6.Complexity Transaction.Feasibility Approach 1 High Medium B 2 Medium Medium I 3 High Medium B 4 Medium Medium I 5 Medium Medium B And here are the lookup tables that I wish to use to find the correct replacement value. > cost.approach.difficulty Approach Difficulty Phase 0 Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6 1 B High 18102.778 29481.67 29481.67 11822.222 30737.78 21634.67 12768.00 2 B Low 3860.694 15978.47 11175.69 7448.000 12768.00 11467.56 11467.56 3 B Medium 5323.694 24974.44 15184.17 9221.333 15368.89 12768.00 12768.00 4 I High 18102.778 74184.44 29481.67 44747.111 69160.00 45249.56 32245.11 5 I Low 3860.694 26008.89 11175.69 16551.111 35910.00 16876.22 14275.33 6 I Medium 5323.694 41156.11 15184.17 22373.556 44776.67 23378.44 16876.22 7 RV High 18102.778 28373.33 29481.67 44747.111 69160.00 45249.56 32245.11 8 RV Low 3860.694 14870.14 11175.69 16551.111 44776.67 16876.22 14275.33 9 RV Medium 5323.694 22757.78 15184.17 22373.556 44776.67 23378.44 16876.22 I'm trying to find a simple solution to look up the corresponding value in the cost.approach.difficulty table for 'Approach' and 'Difficulty'. So for example, in the cost.table, I would like to have the first row, column Phase.0.Difficulty, be replaced with 3860.694 (because it is a 'B' approach and low difficulty. Does anyone have an elegant, simple solution to looking up a value based on two (or more columns) and replacing the value along multiple columns? Thank you, Andrew UPDATE - There are two proposed answers related to using merge. My goal is to find a more succinct, concise, and elegant solution. Here is the best that I've come up with so far: cost.approach.difficulty$Phase.0[match(paste(cost.table$Approach, cost.table$Phase.0.Difficulty), paste(cost.approach.difficulty$Approach, cost.approach.difficulty$Difficulty))] The problem with this solution is that I'd need to know the column names ahead of time and still seems like a hack. Anyone have a more concise solution?

Look up (and Replace) values in data frame based on lookup table (with more than one column)

Reputation: 3601

If you want this to work for a variable number of columns, I suggests reshaping your cost table and your lookup table(s) into a more standardized format.

First, it would have been easier to answer this question if you had given your data in a reproducible format:

# Create the example data
cost.table <- data.frame(
  "Identifier" = c("FS1", "FS2",  "FS3",  "FS4",  "FS5"),
  "Phase.0.Difficulty" = c("Low", "High", "High", "High", "High"),
  "Phase.1.Complexity" = c("Low", "High", "Low", "Medium", "Medium"),
  "Phase.2.Complexity" = c("Low", "High", "Low", "Medium", "Medium"),
  "Phase.3.Complexity" = c("Medium", "Medium", "High", "Medium", "High"),
  "Phase.4.Complexity" = c("Medium", "Medium", "High", "Medium", "Medium"),
  "Phase.5.Complexity" = c("High", "Medium", "High", "Medium", "Medium"),
  "Phase.6.Complexity" = c("High", "Medium", "High", "Medium", "Medium"),
  "Transaction.Feasibility" = c("Medium", "Medium", "Medium", "Medium", "Medium"),
  "Approach" = c("B", "I", "B", "I", "B"),
  stringsAsFactors = FALSE)

cost.approach.difficulty <- data.frame(
  "Approach" = c("B", "B", "B", "I", "I", "I", "RV", "RV", "RV"),
  "Difficulty" = c("High", "Low", "Medium", "High", "Low", "Medium", "High", "Low", "Medium"),
  "Phase.0" = c(18102.778, 3860.694, 5323.694, 18102.778, 3860.694, 5323.694, 18102.778, 3860.694, 5323.694),
  "Phase.1" = c(29481.67,15978.47, 24974.44, 74184.44, 26008.89, 41156.11, 28373.33, 14870.14, 22757.78),
  "Phase.2" = c(29481.67, 11175.69, 15184.17, 29481.67, 11175.69, 15184.17, 29481.67, 11175.69, 15184.17),
  "Phase.3" = c(11822.222, 7448, 9221.333, 44747.111, 16551.111, 22373.556, 44747.111, 16551.111, 22373.556),
  "Phase.4" = c(30737.78, 12768, 15368.89, 69160, 35910, 44776.67, 69160, 44776.67, 44776.67),
  "Phase.5" = c(21634.67, 11467.56, 12768, 45249.56, 16876.22, 23378.44, 45249.56, 16876.22, 23378.44),
  "Phase.6" = c(12768, 11467.56, 12768, 32245.11, 14275.33, 16876.22, 32245.11, 14275.33, 16876.22),
  stringsAsFactors = FALSE)

Once I had recreated your example data, I used the melt.data.frame function from the reshape2 package:

# Reshape the data
require(reshape2)

cost.table <- melt(cost.table, id.vars = c("Identifier", "Approach"), 
  value.name = "Size")
cost.table$Phase <- gsub("(\\w+\\.\\d+)\\.(\\w+)", "\\1", 
  as.character(cost.table$variable), perl = TRUE)
cost.table$Type <- gsub("(\\w+\\.\\d+)\\.(\\w+)", "\\2", 
  as.character(cost.table$variable), perl = TRUE)

head(cost.table)

  Identifier Approach           variable Size   Phase       Type
1        FS1        B Phase.0.Difficulty  Low Phase.0 Difficulty
2        FS2        I Phase.0.Difficulty High Phase.0 Difficulty
3        FS3        B Phase.0.Difficulty High Phase.0 Difficulty
4        FS4        I Phase.0.Difficulty High Phase.0 Difficulty
5        FS5        B Phase.0.Difficulty High Phase.0 Difficulty
6        FS1        B Phase.1.Complexity  Low Phase.1 Complexity

cost.approach.difficulty <- melt(cost.approach.difficulty, 
  id.vars = c("Difficulty", "Approach"), variable.name = "Phase")
cost.approach.difficulty$Phase <- as.character(cost.approach.difficulty$Phase)
cost.approach.difficulty$Type <- "Difficulty"
colnames(cost.approach.difficulty)[
  colnames(cost.approach.difficulty) == "Difficulty"] <- "Size"

head(cost.approach.difficulty)

    Size Approach   Phase     value       Type
1   High        B Phase.0 18102.778 Difficulty
2    Low        B Phase.0  3860.694 Difficulty
3 Medium        B Phase.0  5323.694 Difficulty
4   High        I Phase.0 18102.778 Difficulty
5    Low        I Phase.0  3860.694 Difficulty
6 Medium        I Phase.0  5323.694 Difficulty

Once the two tables are in a standardized format, you can call merge:

cost.table.filled <- merge(cost.table, cost.approach.difficulty, 
  by = c("Approach", "Size", "Phase", "Type"), all.x = TRUE, all.y = FALSE)

Then, if you didn't look up the values for certain columns, you can re-insert the original values (otherwise you end up with a bunch of NAs):

cost.table.filled$value[is.na(cost.table.filled$value)] <- 
  cost.table.filled$Size[is.na(cost.table.filled$value)]

Then you can dcast the thing back into the original format:

cost.table.final <- dcast(cost.table.filled, Identifier + Approach ~ Phase + Type)

head(cost.table.final)

  Identifier Approach Phase.0_Difficulty Phase.1_Complexity Phase.2_Complexity Phase.3_Complexity Phase.4_Complexity Phase.5_Complexity Phase.6_Complexity Transaction.Feasibility_Transaction.Feasibility
1        FS1        B           3860.694                Low                Low             Medium             Medium               High               High                                          Medium
2        FS2        I          18102.778               High               High             Medium             Medium             Medium             Medium                                          Medium
3        FS3        B          18102.778                Low                Low               High               High               High               High                                          Medium
4        FS4        I          18102.778             Medium             Medium             Medium             Medium             Medium             Medium                                          Medium
5        FS5        B          18102.778             Medium             Medium               High             Medium             Medium             Medium                                          Medium

To replace all columns, I would melt each lookup table and then cbind them all together into a single lookup table. That way, you only have to call merge once and you don't have to worry about replacing NAs.

Upvotes: 4

Beasterfield

Reputation: 7123

In this case, merge should do the trick:

cost.table <- merge(
  x = cost.table,
  y = cost.approach.difficulty[c("Approach", "Difficulty", "Phase.0")],
  by.x = c("Phase.0.Difficulty", "Approach"),
  by.y = c("Difficulty", "Approach"), sort = FALSE
)
cost.table$Phase.0.Difficulty <- NULL
names(cost.table)[names(cost.table) == "Phase.0"] <- "Phase.0.Difficulty"

cost.table
  Approach Identifier Phase.1.Complexity Phase.2.Complexity Phase.3.Complexity Phase.4.Complexity Phase.5.Complexity Phase.6.Complexity Transaction.Feasibility Phase.0.Difficulty
1        B        FS1                Low                Low             Medium             Medium               High               High                  Medium           3860.694
2        I        FS2               High               High             Medium             Medium             Medium             Medium                  Medium          18102.778
3        I        FS4             Medium             Medium             Medium             Medium             Medium             Medium                  Medium          18102.778
4        B        FS3                Low                Low               High               High               High               High                  Medium          18102.778
5        B        FS5             Medium             Medium               High             Medium             Medium             Medium                  Medium          18102.778

Upvotes: 0

Look up (and Replace) values in data frame based on lookup table (with more than one column)

Answers (3)

Related Questions