Reputation: 6548
I am trying to create a regex which attempts to match a sentence.
Here is a snippet.
local utf8 = require 'lua-utf8'
function matchsent(text)
local text = text
for sent in utf8.gmatch(text, "[^\r\n]+\.[\r\n ]") do
print(sent)
print('-----')
end
end
However, it does not work like in python for example. I know that Lua uses different set of regex patterns and it's regex capabilities are limited but why does the regex above give me a syntax error? And how a sentence matching regex in Lua would look like?
Upvotes: 3
Views: 895
Reputation: 627336
Note that Lua uses Lua patterns, that are not "regular" expressions as they cannot match a regular language. They can hardly be used to split a text into sentences since you'd need to account for various abbreviations, spacing, case etc. To split a text into sentences, you need an NLP package rather than one or two regexps due to the complexity of the task.
Regarding
why does the regex above give me a syntax error?
you need to escape special symbols with a %
symbol in Lua patterns. See an example code:
function matchsent(text)
for sent in string.gmatch(text, '[^\r\n]+%.[\r\n ]') do
print(sent)
print("---")
end
end
matchsent("Some text here.\nShow me")
An online demo
Upvotes: 2