DED1
DED1

Reputation: 43

How do I split a string without deleting delimiters?

My AutoIt script parses text by sentences. Because they most likely end in a period, question mark or exclamation point, I used this to split text by sentence:

$LineArray = StringSplit($displayed_file, "!?.", 2)

The problem; it deletes delimiters (periods, question marks, and exclamation points at the end of sentences). For example, the string One. Two. Three. is split into One, Two, and Three.

How can I split into sentences while retaining the periods, question marks, and exclamation points that end these sentences?

Upvotes: 0

Views: 305

Answers (2)

Stephan
Stephan

Reputation: 56180

Using StringSplit() the delimiters are consumed in the process (and so are lost for the result). Using StringRegExp() :

#include <array.au3>
$string="This is a text. It has several sentences. Really? Of Course!"
$a = stringregexp($string,"(?U)(.*[.?!])",3)
_ArrayDisplay($a)

To remove leading space(s), change the pattern to "(?U)[ ]*?(.*[.?!])". Or to "(?U) *?(.*[.?!] )" to split at [.!?] plus <space> (adding a space to the last sentence):

#include <array.au3>
$string = "Do you know Pi?   Yes!   What's it?    It's 3.14159!   That's correct."
$a = StringRegExp($string & " ", "(?U)[ ]*?(.*[.?!] )", 3)
_ArrayDisplay($a)

To preserve @CRLF (\r\n) inside sentences:

#include <array.au3>
$string = "Do you " & @CRLF & "know Pi?   Yes!  What's it?    It's" & @CRLF & "3.14159!   That's correct."
$a = StringRegExp($string & "  ", "(?s)(?U)[ ]*?(.*[.?!][ \R] )", 3)
_ArrayDisplay($a,"Sentences")   ;_ArrayDisplay doesn't show @CRLF

For $i In $a
    ;MsgBox(0,"",$i)
    ConsoleWrite(StringStripWS($i, 3) & @CRLF & "---------" & @CRLF)
Next

This does not keep @CRLF when end of line is same as end of sentence: ...line end!" & @CRLF & "Next line....

Upvotes: 0

Xenobiologist
Xenobiologist

Reputation: 2151

Try this:

#include<Array.au3>
Global $str = "One. Two. Three. This is a test! Does it work? Yes, man! "
$re = StringRegExp($str, '(.*?[.!?])', 3)
_ArrayDisplay($re)

This pattern works without the space at the beginning of a sentence

#include<Array.au3>
Global $str = "One. Two. Three.This is a test! Does it work? Yes, man! "
$re = StringRegExp($str, '(\S.*?[.!?])', 3)
_ArrayDisplay($re)

Upvotes: 0

Related Questions