Reputation: 562
Consider the following variable:
clear
input str18 string
"abc bcd cde"
"def efg fgh"
"ghi hij ijk"
end
I can use the regexm()
function to extract all occurrences of abc
, cde
and def
:
generate new = regexm(string, "abc|cde|def")
list
|string new |
|--------------------|
| abc bcd cde 1 |
| def efg fgh 1 |
| ghi hij ijk 0 |
How can I get the following?
|string wanted |
|--------------------------|
| abc bcd cde abc cde |
| def efg fgh def |
| ghi hij ijk |
This question is an extension of the one answered here:
Upvotes: 1
Views: 283
Reputation: 37208
I read this as your
Having a list of allowed words.
Wanting the words in a string that occur among the allowed words.
It is fashionable to seek a fancy regular expression solution for such problems, but your example at least yields to a plain loop over the words that exist. Be aware, however, that inlist()
has advertised limits.
clear
input str18 string
"abc bcd cde"
"def efg fgh"
"ghi hij ijk"
end
generate wanted = ""
generate wc = wordcount(string)
summarize wc, meanonly
quietly forvalues j = 1/`r(max)' {
replace wanted = wanted + " " + word(string, `j') if inlist(word(string, `j'), "abc", "cde", "def")
}
replace wanted = trim(wanted)
list
+----------------------------+
| string wanted wc |
|----------------------------|
1. | abc bcd cde abc cde 3 |
2. | def efg fgh def 3 |
3. | ghi hij ijk 3 |
+----------------------------+
Upvotes: 2
Reputation:
This is the solution using a regular expression:
clear
input str18 string
"abc bcd cde"
"def efg fgh"
"ghi hij ijk"
end
generate wanted = ustrregexra(string, "(\b((?!(abc|cde|def))\w)+\b)", " ")
replace wanted = strtrim(stritrim(wanted))
list
+-----------------------+
| string wanted |
|-----------------------|
1. | abc bcd cde abc cde |
2. | def efg fgh def |
3. | ghi hij ijk |
+-----------------------+
Upvotes: 1