Reputation: 159
I need to remove a parenthesis after a number in a string:
"dl_CONH_r = a0cons+a2cons*(CONH_r_lag_1)-a3cons*HGDI_r_lag_1)-(1-a3cons)*HNW_r_lag_2)+a4cons*rate_90_r_lag_1))+a5cons*dl_HCOE_r+a6cons*dl_HOY_r_lag_2)+a7cons*dl_HNW_r_lag_1)+a8cons*d_rate_UNE_lag_2)+(1-a5cons-a6cons-a7cons)*(dl_TREND_PROD+dl_TREND_AVEH+dl_TREND_WAP)"
The resulting string should look like this:
"dl_CONH_r = a0cons+a2cons*(CONH_r_lag_1-a3cons*HGDI_r_lag_1-(1-a3cons)*HNW_r_lag_2+a4cons*rate_90_r_lag_1)+a5cons*dl_HCOE_r+a6cons*dl_HOY_r_lag_2+a7cons*dl_HNW_r_lag_1+a8cons*d_rate_UNE_lag_2+(1-a5cons-a6cons-a7cons)*(dl_TREND_PROD+dl_TREND_AVEH+dl_TREND_WAP)"
The regular expression I am trying to capture here is the first parenthesis after the string "lag_" followed by some number. Note, that in places there are two parenthesis:
rate_90_r_lag_1))
And I only want to remove the first one.
I've tried a simple regex in gsub
a <- "dl_CONH_r = a0cons+a2cons*(CONH_r_lag_1)-a3cons*HGDI_r_lag_1)-(1-a3cons)*HNW_r_lag_2)+a4cons*rate_90_r_lag_1))+a5cons*dl_HCOE_r+a6cons*dl_HOY_r_lag_2)+a7cons*dl_HNW_r_lag_1)+a8cons*d_rate_UNE_lag_2)+(1-a5cons-a6cons-a7cons)*(dl_TREND_PROD+dl_TREND_AVEH+dl_TREND_WAP)"
gsub("[0-9]\\)","[0-9]",a)
But I the resulting string removes the number and replaces it with [0-9]:
"dl_CONH_r = a0cons+a2cons*(CONH_r_lag_[0-9]-a3cons*HGDI_r_lag_[0-9]-(1-a3cons)*HNW_r_lag_[0-9]+a4cons*rate_90_r_lag_[0-9])+a5cons*dl_HCOE_r+a6cons*dl_HOY_r_lag_[0-9]+a7cons*dl_HNW_r_lag_[0-9]+a8cons*d_rate_UNE_lag_[0-9]+(1-a5cons-a6cons-a7cons)*(dl_TREND_PROD+dl_TREND_AVEH+dl_TREND_WAP)"
I understand that the gsub is doing what it is intended to do. What I don't know is how to keep the number before the parenthesis?
Upvotes: 3
Views: 185
Reputation: 473
You can do this using capture groups:
Lets just try it on the string my_string <- " = a0cons+a2cons*(CONH_r_lag_1)-a3cons*"
reg_expression <- "(.*[0-9])\\)(.*)" #two capture groups, with the parenthesis not in a group
my_sub_string <- sub(reg_expression,"\\1\\2", my_string)
Notice "\\1"
reads like \1
to the regex engine, and so is a special character referring to the first capture group. (These can also be named)
Another way of doing this is lookarounds:
There are two basic kinds of lookarounds, a lookahead (?=)
and a lookbehind (?<=)
. Since we want to match a pattern, but not capture, something behind our matched expression we need a lookbehind.
reg_expression <- "(?<=[0-9])\\)" #lookbehind
my_sub_string <- sub(reg_expression,"", my_string)
Which will match the pattern, but only replace the parenthesis.
Upvotes: 1
Reputation: 3636
You need to use a look around (in this case the preceded by) so that it will match just the parentheses as the matching group instead of the numbers and the parentheses. Then you can just remove the parentheses.
gsub("(?<=[0-9])\\)","", a, perl = TRUE)
Upvotes: 1