Reputation: 43
UPDATE Below
:Original
I'm trying to find the most elegant (simple and concise) way of replacing the value of certain columns based on matching two columns from another data frame.
Here is the table with the columns I wish to replace (based on what values they contain).
> cost.table
Identifier Phase.0.Difficulty Phase.1.Complexity Phase.2.Complexity Phase.3.Complexity Phase.4.Complexity Phase.5.Complexity
1 FS1 Low Low Low Medium Medium High
2 FS2 High High High Medium Medium Medium
3 FS3 High Low Low High High High
4 FS4 High Medium Medium Medium Medium Medium
5 FS5 High Medium Medium High Medium Medium
Phase.6.Complexity Transaction.Feasibility Approach
1 High Medium B
2 Medium Medium I
3 High Medium B
4 Medium Medium I
5 Medium Medium B
And here are the lookup tables that I wish to use to find the correct replacement value.
> cost.approach.difficulty
Approach Difficulty Phase 0 Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6
1 B High 18102.778 29481.67 29481.67 11822.222 30737.78 21634.67 12768.00
2 B Low 3860.694 15978.47 11175.69 7448.000 12768.00 11467.56 11467.56
3 B Medium 5323.694 24974.44 15184.17 9221.333 15368.89 12768.00 12768.00
4 I High 18102.778 74184.44 29481.67 44747.111 69160.00 45249.56 32245.11
5 I Low 3860.694 26008.89 11175.69 16551.111 35910.00 16876.22 14275.33
6 I Medium 5323.694 41156.11 15184.17 22373.556 44776.67 23378.44 16876.22
7 RV High 18102.778 28373.33 29481.67 44747.111 69160.00 45249.56 32245.11
8 RV Low 3860.694 14870.14 11175.69 16551.111 44776.67 16876.22 14275.33
9 RV Medium 5323.694 22757.78 15184.17 22373.556 44776.67 23378.44 16876.22
I'm trying to find a simple solution to look up the corresponding value in the cost.approach.difficulty table for 'Approach' and 'Difficulty'.
So for example, in the cost.table, I would like to have the first row, column Phase.0.Difficulty, be replaced with 3860.694 (because it is a 'B' approach and low difficulty.
Does anyone have an elegant, simple solution to looking up a value based on two (or more columns) and replacing the value along multiple columns?
Thank you,
Andrew
UPDATE -
There are two proposed answers related to using merge. My goal is to find a more succinct, concise, and elegant solution. Here is the best that I've come up with so far:
cost.approach.difficulty$Phase.0[match(paste(cost.table$Approach, cost.table$Phase.0.Difficulty), paste(cost.approach.difficulty$Approach, cost.approach.difficulty$Difficulty))]
The problem with this solution is that I'd need to know the column names ahead of time and still seems like a hack. Anyone have a more concise solution?
Upvotes: 1
Views: 7315
Reputation: 43
The simplest answer appears to be to:
The code below accomplishes the multiple column lookup with one line.
cost.approach.difficulty$Phase.0[match(paste(cost.table$Approach,
cost.table$Phase.0.Difficulty), paste(cost.approach.difficulty$Approach,
cost.approach.difficulty$Difficulty))]
To cycle through multiple columns, a for loop works fine.
Unfortunately, I was hoping for a native solution that perhaps took a vector of columns and combined them for the lookup, but I haven't found it yet. I'll be checking additional packages to see if such a function exists.
Upvotes: 0
Reputation: 3601
If you want this to work for a variable number of columns, I suggests reshaping your cost table and your lookup table(s) into a more standardized format.
First, it would have been easier to answer this question if you had given your data in a reproducible format:
# Create the example data
cost.table <- data.frame(
"Identifier" = c("FS1", "FS2", "FS3", "FS4", "FS5"),
"Phase.0.Difficulty" = c("Low", "High", "High", "High", "High"),
"Phase.1.Complexity" = c("Low", "High", "Low", "Medium", "Medium"),
"Phase.2.Complexity" = c("Low", "High", "Low", "Medium", "Medium"),
"Phase.3.Complexity" = c("Medium", "Medium", "High", "Medium", "High"),
"Phase.4.Complexity" = c("Medium", "Medium", "High", "Medium", "Medium"),
"Phase.5.Complexity" = c("High", "Medium", "High", "Medium", "Medium"),
"Phase.6.Complexity" = c("High", "Medium", "High", "Medium", "Medium"),
"Transaction.Feasibility" = c("Medium", "Medium", "Medium", "Medium", "Medium"),
"Approach" = c("B", "I", "B", "I", "B"),
stringsAsFactors = FALSE)
cost.approach.difficulty <- data.frame(
"Approach" = c("B", "B", "B", "I", "I", "I", "RV", "RV", "RV"),
"Difficulty" = c("High", "Low", "Medium", "High", "Low", "Medium", "High", "Low", "Medium"),
"Phase.0" = c(18102.778, 3860.694, 5323.694, 18102.778, 3860.694, 5323.694, 18102.778, 3860.694, 5323.694),
"Phase.1" = c(29481.67,15978.47, 24974.44, 74184.44, 26008.89, 41156.11, 28373.33, 14870.14, 22757.78),
"Phase.2" = c(29481.67, 11175.69, 15184.17, 29481.67, 11175.69, 15184.17, 29481.67, 11175.69, 15184.17),
"Phase.3" = c(11822.222, 7448, 9221.333, 44747.111, 16551.111, 22373.556, 44747.111, 16551.111, 22373.556),
"Phase.4" = c(30737.78, 12768, 15368.89, 69160, 35910, 44776.67, 69160, 44776.67, 44776.67),
"Phase.5" = c(21634.67, 11467.56, 12768, 45249.56, 16876.22, 23378.44, 45249.56, 16876.22, 23378.44),
"Phase.6" = c(12768, 11467.56, 12768, 32245.11, 14275.33, 16876.22, 32245.11, 14275.33, 16876.22),
stringsAsFactors = FALSE)
Once I had recreated your example data, I used the melt.data.frame
function from the reshape2
package:
# Reshape the data
require(reshape2)
cost.table <- melt(cost.table, id.vars = c("Identifier", "Approach"),
value.name = "Size")
cost.table$Phase <- gsub("(\\w+\\.\\d+)\\.(\\w+)", "\\1",
as.character(cost.table$variable), perl = TRUE)
cost.table$Type <- gsub("(\\w+\\.\\d+)\\.(\\w+)", "\\2",
as.character(cost.table$variable), perl = TRUE)
head(cost.table)
Identifier Approach variable Size Phase Type
1 FS1 B Phase.0.Difficulty Low Phase.0 Difficulty
2 FS2 I Phase.0.Difficulty High Phase.0 Difficulty
3 FS3 B Phase.0.Difficulty High Phase.0 Difficulty
4 FS4 I Phase.0.Difficulty High Phase.0 Difficulty
5 FS5 B Phase.0.Difficulty High Phase.0 Difficulty
6 FS1 B Phase.1.Complexity Low Phase.1 Complexity
cost.approach.difficulty <- melt(cost.approach.difficulty,
id.vars = c("Difficulty", "Approach"), variable.name = "Phase")
cost.approach.difficulty$Phase <- as.character(cost.approach.difficulty$Phase)
cost.approach.difficulty$Type <- "Difficulty"
colnames(cost.approach.difficulty)[
colnames(cost.approach.difficulty) == "Difficulty"] <- "Size"
head(cost.approach.difficulty)
Size Approach Phase value Type
1 High B Phase.0 18102.778 Difficulty
2 Low B Phase.0 3860.694 Difficulty
3 Medium B Phase.0 5323.694 Difficulty
4 High I Phase.0 18102.778 Difficulty
5 Low I Phase.0 3860.694 Difficulty
6 Medium I Phase.0 5323.694 Difficulty
Once the two tables are in a standardized format, you can call merge
:
cost.table.filled <- merge(cost.table, cost.approach.difficulty,
by = c("Approach", "Size", "Phase", "Type"), all.x = TRUE, all.y = FALSE)
Then, if you didn't look up the values for certain columns, you can re-insert the original values (otherwise you end up with a bunch of NAs):
cost.table.filled$value[is.na(cost.table.filled$value)] <-
cost.table.filled$Size[is.na(cost.table.filled$value)]
Then you can dcast
the thing back into the original format:
cost.table.final <- dcast(cost.table.filled, Identifier + Approach ~ Phase + Type)
head(cost.table.final)
Identifier Approach Phase.0_Difficulty Phase.1_Complexity Phase.2_Complexity Phase.3_Complexity Phase.4_Complexity Phase.5_Complexity Phase.6_Complexity Transaction.Feasibility_Transaction.Feasibility
1 FS1 B 3860.694 Low Low Medium Medium High High Medium
2 FS2 I 18102.778 High High Medium Medium Medium Medium Medium
3 FS3 B 18102.778 Low Low High High High High Medium
4 FS4 I 18102.778 Medium Medium Medium Medium Medium Medium Medium
5 FS5 B 18102.778 Medium Medium High Medium Medium Medium Medium
To replace all columns, I would melt
each lookup table and then cbind
them all together into a single lookup table. That way, you only have to call merge
once and you don't have to worry about replacing NAs.
Upvotes: 4
Reputation: 7123
In this case, merge
should do the trick:
cost.table <- merge(
x = cost.table,
y = cost.approach.difficulty[c("Approach", "Difficulty", "Phase.0")],
by.x = c("Phase.0.Difficulty", "Approach"),
by.y = c("Difficulty", "Approach"), sort = FALSE
)
cost.table$Phase.0.Difficulty <- NULL
names(cost.table)[names(cost.table) == "Phase.0"] <- "Phase.0.Difficulty"
cost.table
Approach Identifier Phase.1.Complexity Phase.2.Complexity Phase.3.Complexity Phase.4.Complexity Phase.5.Complexity Phase.6.Complexity Transaction.Feasibility Phase.0.Difficulty
1 B FS1 Low Low Medium Medium High High Medium 3860.694
2 I FS2 High High Medium Medium Medium Medium Medium 18102.778
3 I FS4 Medium Medium Medium Medium Medium Medium Medium 18102.778
4 B FS3 Low Low High High High High Medium 18102.778
5 B FS5 Medium Medium High Medium Medium Medium Medium 18102.778
Upvotes: 0