Arrebimbomalho
Arrebimbomalho

Reputation: 186

Conditionally replacing a substring value in a column with substrings of other columns

Let's say you have the following simple dataframe:

Input <- c("X0_1-2 + X1_1-2","X0_1-2 + X1_1-2","X0_1-3 + X1_1-3","X0_3-2 + X1_3-2","X0_3-1 + X1_3-1","X0_2-1 + X1_2-1","X0_2-3 + X1_2-3","X0_13-1 + X1_13-1")
State1 <- c("1-3","1-3","1-2","3-1","3-2","2-1","2-1","13-3")
State2 <- c("1-2","1-2","1-3","3-2","3-1","2-3","2-3","13-1")
DataFrame <- cbind(Input,State1,State2)
DataFrame <- as.data.frame(DataFrame)

Yielding

            Input State1 State2
1 X0_1-2 + X1_1-2    1-3    1-2
2 X0_1-2 + X1_1-2    1-3    1-2
3 X0_1-3 + X1_1-3    1-2    1-3
4 X0_3-2 + X1_3-2    3-1    3-2
5 X0_3-1 + X1_3-1    3-2    3-1
6 X0_2-1 + X1_2-1    2-1    2-3
7 X0_2-3 + X1_2-3    2-1    2-3
8 X0_13-1 + X1_13-1  13-3   13-1

I have tried to come up with a smart way to add an extra column equal to the "Input" column, but such that the values following the "_" are either those of State1 or State2, according to which is different from the corresponding substring in Input, i.e, in this case the desired outcome is

            Input State1 State2          Outcome
1 X0_1-2 + X1_1-2    1-3    1-2 X0_1-3 + X1_1-3
2 X0_1-2 + X1_1-2    1-3    1-2 X0_1-3 + X1_1-3
3 X0_1-3 + X1_1-3    1-2    1-3 X0_1-2 + X1_1-2
4 X0_3-2 + X1_3-2    3-1    3-2 X0_3-1 + X1_3-1
5 X0_3-1 + X1_3-1    3-2    3-1 X0_3-2 + X1_3-2
6 X0_2-1 + X1_2-1    2-1    2-3 X0_2-3 + X1_2-3
7 X0_2-3 + X1_2-3    2-1    2-3 X0_2-1 + X1_2-1
8 X0_13-1 + X1_13-1  13-3    13-1 X0_13-3 + X1_13-3

But have been unsuccessful thus far.

The idea is to replace whatever comes after the _ in the Input field, on both sides of the sum, with either the value of State1 or State2, whichever is different.

Any ideas/input would be highly appreciated. Thanks!

Upvotes: 2

Views: 176

Answers (2)

PKumar
PKumar

Reputation: 11128

I would do this, Assuming df is your dataframe:

replacement <- c("State2","State1")[mapply(grepl, df$State2, df$Input)+1]
df$output <- sapply(1:nrow(df), function(i)gsub( "\\d+-\\d+",df[i, replacement[i]],df[i,"Input"]))

Output:

> df
            Input State1 State2          output
1 X0_1-2 + X1_1-2    1-3    1-2 X0_1-3 + X1_1-3
2 X0_1-2 + X1_1-2    1-3    1-2 X0_1-3 + X1_1-3
3 X0_1-3 + X1_1-3    1-2    1-3 X0_1-2 + X1_1-2
4 X0_3-2 + X1_3-2    3-1    3-2 X0_3-1 + X1_3-1
5 X0_3-1 + X1_3-1    3-2    3-1 X0_3-2 + X1_3-2
6 X0_2-1 + X1_2-1    2-1    2-3 X0_2-3 + X1_2-3
7 X0_2-3 + X1_2-3    2-1    2-3 X0_2-1 + X1_2-1
8 X0_2-1 + X1_2-1    2-3    2-1 X0_2-3 + X1_2-3

Upvotes: 2

De Novo
De Novo

Reputation: 7610

If I understand correctly, the state expressed by Input and Outcome is the same for both the "XO" and the "X1" portion of the string. Also State1 and State2 are never the same. In this case, you can pull out the state from input, compare it to one of the two states, and paste together your output string thus:

output <- ifelse(substring(DataFrame$Input, 13) == State1, State2, State1)
DataFrame$Outcome <- paste("X0_", output, " + X1_", output, sep = "")
DataFrame
#               Input State1 State2           Outcome
# 1   X0_1-2 + X1_1-2    1-3    1-2   X0_1-3 + X1_1-3
# 2   X0_1-2 + X1_1-2    1-3    1-2   X0_1-3 + X1_1-3
# 3   X0_1-3 + X1_1-3    1-2    1-3   X0_1-2 + X1_1-2
# 4   X0_3-2 + X1_3-2    3-1    3-2   X0_3-1 + X1_3-1
# 5   X0_3-1 + X1_3-1    3-2    3-1   X0_3-2 + X1_3-2
# 6   X0_2-1 + X1_2-1    2-1    2-3   X0_2-3 + X1_2-3
# 7   X0_2-3 + X1_2-3    2-1    2-3   X0_2-1 + X1_2-1
# 8 X0_13-1 + X1_13-1   13-3   13-1 X0_13-3 + X1_13-3

This solution will work for any length of "state" substring (e.g., both "1-1" and "201-14") expressed by theInput` variable. You could use regex, but in this case, extracting based on position works (and is more efficient).

Upvotes: 3

Related Questions