user1964248
user1964248

Reputation: 23

Lua - Removing words that are not in a list

i want to remove words that are not in a list, from a string.

for example i have the string "i like pie and cake" or "pie and cake is good" and i want to remove words that are not "pie" or "cake" and end out with a string saying "pie cake".

it would be great, if the words it does not delete could be loaded from a table.

Upvotes: 2

Views: 869

Answers (3)

dualed
dualed

Reputation: 10502

The following also implements the last part of the request (I hope):

it would be great, if the words it does not delete could be loaded from a table.

function stripwords(str, words)
    local w = {};
    return str:gsub("([^%s.,!?]+)%s*", function(word)
        if words[word] then return "" end
        w[#w+1] = word
    end), w;
end

Keep in mind that the pattern matcher of Lua is not compatible with multibyte strings. This is why I used the pattern above. If you don't care about multibyte strings, you can use something like "(%a+)%s". In that case I would also run the words through string.upper

Tests / Usage

local blacklist = { some = true, are = true, less = true, politics = true }
print((stripwords("There are some nasty words in here!", blacklist)))

local r, t = stripwords("some more are in politics here!", blacklist);
print(r);
for k,v in pairs(t) do
    print(k, v);
end

Upvotes: 0

W.B.
W.B.

Reputation: 5525

local function stripwords(inputstring, inputtable)
  local retstring = {}
  local itemno = 1;
  for w in string.gmatch(inputstring, "%a+") do
     if inputtable[w] then
       retstring[itemno] = w
       itemno = itemno + 1
     end
  end

  return table.concat(retstring, " ")
end

Provided that the words you want to keep are all keys of the inputtable.

Upvotes: 3

lhf
lhf

Reputation: 72312

Here's another solution, but you may need to trim the last space in the result.

acceptable = { "pie", "cake" }
for k,v in ipairs(acceptable) do acceptable[v]=v.." " end
setmetatable(acceptable,{__index= function () return "" end})

function strip(s,t)
    s=s.." "
    print('"'..s:gsub("(%a+) %s*",t)..'"')
end

strip("i like pie and cake",acceptable)
strip("pie and cake is good",acceptable)

gsub is the key point here. There are other variations using gsub and a function, instead of setting a metatable for acceptable.

Upvotes: 4

Related Questions