Reputation: 15
I'm having some trouble trying to find an appropriate method for string substitution. I would like to replace every character in a string 'except' for a selection of words or set of string (provided in an array). I know there's a gsub
method, but I guess what I'm trying to achieve is its reverse. For example...
My string: "Part of this string needs to be substituted"
Keywords: ["this string", "substituted"]
Desired output: "**** ** this string ***** ** ** substituted"
ps. It's my first question ever, so your help will be greatly appreciated!
Upvotes: 1
Views: 1779
Reputation: 110725
You can do that using the form of String#split that uses a regex with a capture group.
Code
def sub_some(str, keywords)
str.split(/(#{keywords.join('|')})/)
.map {|s| keywords.include?(s) ? s : s.gsub(/./) {|c| (c==' ') ? c : '*'}}
.join
end
Example
str = "Part of this string needs to be substituted"
keywords = ["this string", "substituted"]
sub_some(str, keywords)
#=> "**** ** this string ***** ** ** substituted"
Explanation
r = /(#{keywords.join('|')})/
#=> /(this string|substituted)/
a = str.split(r)
#=> ["Part of ", "this string", " needs to be ", "substituted"]
e = a.map
#=> #<Enumerator: ["Part of ", "this string", " needs to be ",
# "substituted"]:map>
s = e.next
#=> "Part of "
keywords.include?(s) ? s : s.gsub(/./) { |c| (c==' ') ? c : '*' }
#=> s.gsub(/./) { |c| (c==' ') ? c : '*' }
#=> "Part of "gsub(/./) { |c| (c==' ') ? c : '*' }
#=> "**** ** "
s = e.next
keywords.include?(s) ? s : s.gsub(/./) { |c| (c==' ') ? c : '*' }
#=> "this string"
keywords.include?(s) ? s : s.gsub(/./) { |c| (c==' ') ? c : '*' }
#=> s
#=> "this string"
and so on... Lastly,
["**** ** ", "this string", " ***** ** ** ", "substituted"].join('|')
#=> "**** ** this string ***** ** ** substituted"
Note that, prior to v.1.9.3, Enumerable#map did not return an enumerator when no block is given. The calculations are the same, however.
Upvotes: 0
Reputation: 96
str = "Part of this string needs to be substituted"
keywords = ["this string", "substituted"]
pattern = /(#{keywords.join('|')})/
str.split(pattern).map {|i| keywords.include?(i) ? i : i.gsub(/\S/,"*")}.join
#=> "**** ** this string ***** ** ** substituted"
A more readable version of the same code
str = "Part of this string needs to be substituted"
keywords = ["this string", "substituted"]
#Use regexp pattern to split string around keywords.
pattern = /(#{keywords.join('|')})/ #pattern => /(this string|substituted)/
str = str.split(pattern) #=> ["Part of ", "this string", " needs to be ", "substituted"]
redacted = str.map do |i|
if keywords.include?(i)
i
else
i.gsub(/\S/,"*") # replace all non-whitespace characters with "*"
end
end
# redacted => ["**** **", "this string", "***** ** **", "substituted"]
redacted.join
Upvotes: 0
Reputation: 627082
You can use the following approach: collect the substrings that you need to turn into asterisks, and then perform this replacement:
str="Part of this string needs to be substituted"
arr = ["this string", "substituted"]
arr_to_remove = str.split(Regexp.new("\\b(?:" + arr.map { |x| Regexp.escape(x) }.join('|') + ")\\b|\\s+")).reject { |s| s.empty? }
arr_to_remove.each do |s|
str = str.gsub(s, "*" * s.length)
end
puts str
Output of the demo program:
**** ** this string ***** ** ** substituted
Upvotes: 0
Reputation: 357
Here's a different approach. First, do the reverse of what you ultimately want: redact what you want to keep. Then compare this redacted string to your original character by character, and if the characters are the same, redact, and if they are not, keep the original.
class String
# Returns a string with all words except those passed in as keepers
# redacted.
#
# "Part of this string needs to be substituted".gsub_except(["this string", "substituted"], '*')
# # => "**** ** this string ***** ** ** substituted"
def gsub_except keep, mark
reverse_keep = self.dup
keep.each_with_object(Hash.new(0)) { |e, a| a[e] = mark * e.length }
.each { |word, redacted| reverse_keep.gsub! word, redacted }
reverse_keep.chars.zip(self.chars).map do |redacted, original|
redacted == original && original != ' ' ? mark : original
end.join
end
end
Upvotes: 1
Reputation: 2841
This might be a little more understandable than my last answer:
s = "Part of this string needs to be substituted"
k = ["this string", "substituted"]
tmp = s
for(key in k) {
tmp = tmp.replace(k[key], function(x){ return "*".repeat(x.length)})
}
res = s.split("")
for(charIdx in s) {
if(tmp[charIdx] != "*" && tmp[charIdx] != " ") {
res[charIdx] = "*"
} else {
res[charIdx] = s.charAt(charIdx)
}
}
var finalResult = res.join("")
Explanation:
This goes off of my previous idea about using where the keywords are in order to replace portions of the string with stars. First off:
For each of the keywords we replace it with stars, of the same length as it. So:
s.replace("this string", function(x){
return "*".repeat(x.length)
}
replaces the portion of s that matches "this string" with x.length *
's
We do this for each key, for completeness, you should make sure that the replace is global and not just the first match found. /this string/g
, I didn't do it in the answer, but I think you should be able to figure out how to use new RegExp
by yourself.
Next up, we split a copy of the original string into an array. If you're a visual person, it should make sense to think of this as a weird sort of character addition:
"Part of this string needs to be substituted"
"Part of *********** needs to be substituted" +
---------------------------------------------
**** ** this string ***** ** ** ***********
is what we're going for. So if our tmp
variable has stars, then we want to bring over the original string, and otherwise we want to replace the character with a *
This is easily done with an if statement. And to make it like your example in the question, we also bring over the original character if it's a space. Lastly, we join the array back into a string via .join("")
so that you can work with a string again.
Makes sense?
Upvotes: 0
Reputation: 84
You can use something like:
str="Part of this string needs to be substituted"
keep = ["this","string", "substituted"]
str.split(" ").map{|word| keep.include?(word) ? word : word.split("").map{|w| "*"}.join}.join(" ")
but this will work only to keep words, not phrases.
Upvotes: 0