Christos K.
Christos K.

Reputation: 99

How do I remove lines from a string begins with specific string in Lua?

How do I remove lines from a string begins with another string in Lua ? For instance i want to remove all line from string result begins with the word <Table. This is the code I've written so far:

for line in result:gmatch"<Table [^\n]*" do line = "" end

Upvotes: 2

Views: 7739

Answers (4)

Tyler
Tyler

Reputation: 28874

result = result:gsub('%f[^\n%z]<Table [^\n]*', '')

The start of this pattern, '%f[^\n%z], is a frontier pattern which will match any transition from either a newline or zero character to another character, and for frontier patterns the pre-first character counts as a zero character. In other words, using that prefix allows the rest of the pattern to match at either the first line or any other start-of-line.

Reference: the Lua 5.3 manual, section 6.4.1 on string patterns

Upvotes: 0

RBerteig
RBerteig

Reputation: 43326

The other answers provide good solutions to actually stripping lines from a string, but don't address why your code is failing to do that.

Reformatting for clarity, you wrote:

for line in result:gmatch"<Table [^\n]*" do 
    line = "" 
end

The first part is a reasonable way to iterate over result and extract all spans of text that begin with <Table and continue up to but not including the next newline character. The iterator returned by gmatch returns a copy of the matching text on each call, and the local variable line holds that copy for the body of the for loop.

Since the matching text is copied to line, changes made to line are not and cannot modifying the actual text stored in result.

This is due to a more fundamental property of Lua strings. All strings in Lua are immutable. Once stored, they cannot be changed. Variables holding strings are actually holding a pointer into the internal table of reference counted immutable strings, which permits only two operations: internalization of a new string, and deletion of an internalized string with no remaining references.

So any approach to editing the content of the string stored in result is going to require the creation of an entirely new string. Where string.gmatch provides an iteration over the content but cannot allow it to be changed, string.gsub provides for creation of a new string where all text matching a pattern has been replaced by something new. But even string.gsub is not changing the immutable source text; it is creating a new immutable string that is a copy of the old with substitutions made.

Using gsub could be as simple as this:

result = result:gsub("<Table [^\n]*", "")

but that will disclose other defects in the pattern itself. First, and most obviously, nothing requires that the pattern match at only the beginning of the line. Second, the pattern does not include the newline, so it will leave the line present but empty.

All of that can be refined by careful and clever use of the pattern library. But it doesn't change the fact that you are starting with XML text and are not handling it with XML aware tools. In that case, any approach based on pattern matching or even regular expressions is likely to end in tears.

Upvotes: 1

Philipp Gesang
Philipp Gesang

Reputation: 526

The LPEG library is perfect for this kind of task. Just write a function to create custom line strippers:

local mk_striplines
do
  local lpeg      = require "lpeg"
  local P         = lpeg.P
  local Cs        = lpeg.Cs
  local lpegmatch = lpeg.match

  local eol       = P"\n\r" + P"\r\n" + P"\n" + P"\t"
  local eof       = P(-1)
  local linerest  = (1 - eol)^1 * (eol + eof) + eol

  mk_striplines = function (pat)
    pat               = P (pat)
    local matchline   = pat * linerest
    local striplines  = Cs (((matchline / "") + linerest)^1)
    return function (str)
      return lpegmatch (striplines, str)
    end
  end
end

Note that the argument to mk_striplines() may be a string or a pattern. Thus the result is very flexible: mk_striplines (P"<Table" + P"</Table>") would create a stripper that drops lines with two different patterns. mk_striplines (P"x" * P"y"^0) drops each line starting with an x followed by any number of y’s -- you get the idea.

Usage example:

local linestripper = mk_striplines "foo"

local test = [[
foo lorem ipsum
bar baz
buzz
foo bar
xyzzy
]]

print (linestripper (test))

Upvotes: 1

Yu Hao
Yu Hao

Reputation: 122383

string.gmtach is used to get all occurrences of a pattern. For replacing certain pattern, you need to use string.gsub.

Another problem is your pattern <Table [^\n]* will match all line containing the word <Table, not just begins with it.

Lua pattern doesn't support beginning of line anchor, this almost works:

local str = result:gsub("\n<Table [^\n]*", "")

except that it will miss on the first line. My solution is using a second run to test the first line:

local str1 = result:gsub("\n<Table [^\n]*", "")
local str2 = str1:gsub("^<Table [^\n]*\n", "")

Upvotes: 1

Related Questions