Reputation: 186
Let's say you have the following simple dataframe:
Input <- c("X0_1-2 + X1_1-2","X0_1-2 + X1_1-2","X0_1-3 + X1_1-3","X0_3-2 + X1_3-2","X0_3-1 + X1_3-1","X0_2-1 + X1_2-1","X0_2-3 + X1_2-3","X0_13-1 + X1_13-1")
State1 <- c("1-3","1-3","1-2","3-1","3-2","2-1","2-1","13-3")
State2 <- c("1-2","1-2","1-3","3-2","3-1","2-3","2-3","13-1")
DataFrame <- cbind(Input,State1,State2)
DataFrame <- as.data.frame(DataFrame)
Yielding
Input State1 State2
1 X0_1-2 + X1_1-2 1-3 1-2
2 X0_1-2 + X1_1-2 1-3 1-2
3 X0_1-3 + X1_1-3 1-2 1-3
4 X0_3-2 + X1_3-2 3-1 3-2
5 X0_3-1 + X1_3-1 3-2 3-1
6 X0_2-1 + X1_2-1 2-1 2-3
7 X0_2-3 + X1_2-3 2-1 2-3
8 X0_13-1 + X1_13-1 13-3 13-1
I have tried to come up with a smart way to add an extra column equal to the "Input" column, but such that the values following the "_" are either those of State1 or State2, according to which is different from the corresponding substring in Input, i.e, in this case the desired outcome is
Input State1 State2 Outcome
1 X0_1-2 + X1_1-2 1-3 1-2 X0_1-3 + X1_1-3
2 X0_1-2 + X1_1-2 1-3 1-2 X0_1-3 + X1_1-3
3 X0_1-3 + X1_1-3 1-2 1-3 X0_1-2 + X1_1-2
4 X0_3-2 + X1_3-2 3-1 3-2 X0_3-1 + X1_3-1
5 X0_3-1 + X1_3-1 3-2 3-1 X0_3-2 + X1_3-2
6 X0_2-1 + X1_2-1 2-1 2-3 X0_2-3 + X1_2-3
7 X0_2-3 + X1_2-3 2-1 2-3 X0_2-1 + X1_2-1
8 X0_13-1 + X1_13-1 13-3 13-1 X0_13-3 + X1_13-3
But have been unsuccessful thus far.
The idea is to replace whatever comes after the _ in the Input field, on both sides of the sum, with either the value of State1 or State2, whichever is different.
Any ideas/input would be highly appreciated. Thanks!
Upvotes: 2
Views: 176
Reputation: 11128
I would do this, Assuming df is your dataframe:
replacement <- c("State2","State1")[mapply(grepl, df$State2, df$Input)+1]
df$output <- sapply(1:nrow(df), function(i)gsub( "\\d+-\\d+",df[i, replacement[i]],df[i,"Input"]))
Output:
> df
Input State1 State2 output
1 X0_1-2 + X1_1-2 1-3 1-2 X0_1-3 + X1_1-3
2 X0_1-2 + X1_1-2 1-3 1-2 X0_1-3 + X1_1-3
3 X0_1-3 + X1_1-3 1-2 1-3 X0_1-2 + X1_1-2
4 X0_3-2 + X1_3-2 3-1 3-2 X0_3-1 + X1_3-1
5 X0_3-1 + X1_3-1 3-2 3-1 X0_3-2 + X1_3-2
6 X0_2-1 + X1_2-1 2-1 2-3 X0_2-3 + X1_2-3
7 X0_2-3 + X1_2-3 2-1 2-3 X0_2-1 + X1_2-1
8 X0_2-1 + X1_2-1 2-3 2-1 X0_2-3 + X1_2-3
Upvotes: 2
Reputation: 7610
If I understand correctly, the state expressed by Input
and Outcome
is the same for both the "XO"
and the "X1"
portion of the string. Also State1
and State2
are never the same. In this case, you can pull out the state from input, compare it to one of the two states, and paste together your output string thus:
output <- ifelse(substring(DataFrame$Input, 13) == State1, State2, State1)
DataFrame$Outcome <- paste("X0_", output, " + X1_", output, sep = "")
DataFrame
# Input State1 State2 Outcome
# 1 X0_1-2 + X1_1-2 1-3 1-2 X0_1-3 + X1_1-3
# 2 X0_1-2 + X1_1-2 1-3 1-2 X0_1-3 + X1_1-3
# 3 X0_1-3 + X1_1-3 1-2 1-3 X0_1-2 + X1_1-2
# 4 X0_3-2 + X1_3-2 3-1 3-2 X0_3-1 + X1_3-1
# 5 X0_3-1 + X1_3-1 3-2 3-1 X0_3-2 + X1_3-2
# 6 X0_2-1 + X1_2-1 2-1 2-3 X0_2-3 + X1_2-3
# 7 X0_2-3 + X1_2-3 2-1 2-3 X0_2-1 + X1_2-1
# 8 X0_13-1 + X1_13-1 13-3 13-1 X0_13-3 + X1_13-3
This solution will work for any length of "state" substring (e.g., both "1-1" and "201-14") expressed by the
Input` variable. You could use regex, but in this case, extracting based on position works (and is more efficient).
Upvotes: 3