Reputation: 47
I need to split a string without removal of delimiter in Python.
Eg:
content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
content = content.split('\s\d\s')
After this I am getting like this:
This\n
string is very big\n
i need to split it\n
into paragraph wise.\n
But this string\n
not a formated string.
but I want like this way:
This\n
1 string is very big\n
2 i need to split it\n
3 into paragraph wise.\n
4 But this string\n
5 not a formated string
Upvotes: 3
Views: 4139
Reputation: 4236
If it is a question of new lines only, then use the string method splitlines() with keepends=True:
>>> "This\nis\na\ntest".splitlines(True)
["This\n", "is\n", "a\n", "test"]
Otherwise you can:
def split (s, d="\n"):
d = str(d)
if d=="": raise ValueError, "empty separator"
f = s.find(d)
if f==-1: return [s]
l = []
li = 0 # Last index
add = len(d)
while f!=-1:
l.append(s[li:f+add])
li = f+add
f = s.find(d, li)
e = s[li:]
if e: l.append(e)
return l
Upvotes: 0
Reputation: 931
You can try this
import re
content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
[ i.group(0).strip() for i in re.finditer('\S\d?[^\d]+', content)]
This one stops matching the string when it reaches a digit, but digits at the beginning are allowed.
Following is the output:
['This', '1 string is very big', '2 i need to split it', '3 into paragraph wise.', '4 But this string', '5 not a formated string.']
Upvotes: 0
Reputation: 51997
You could use re.split
with forward lookahead:
import re
re.split('\s(?=\d\s)',content)
resulting in:
['This', '1 string is very big', '2 i need to split it', '3 into paragraph wise.', '4 But this string', '5 not a formated string.']
This splits on spaces -- but only those which are immediately followed by a digit then another space.
Upvotes: 1
Reputation: 919
Use regex module provided by python.
by re.sub
you can find a regex group and replace it with your desired string. \g<0>
is used to use the matched group ( in this case the numbers ).
Example:
import re
content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
result = re.sub(r'\s\d\s',r'\n\g<0>',content)
Result would be :
'This\n 1 string is very big\n 2 i need to split it\n 3 into paragraph wise.\n 4 But this string\n 5 not a formated string.'
Here is more in-depth details about re.sub
Upvotes: 3
Reputation: 4412
Why not just store the output, iterate over it, and place your delimiters back where you want them? If the delimiters need to change each time, you could use the index of the loop that you use to iterate to decide what they are/need to be.
You might find this post useful.
Upvotes: 0