Johnny
Johnny

Reputation: 47

split string without removal of delimiter in python

I need to split a string without removal of delimiter in Python.

Eg:

content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
content = content.split('\s\d\s') 

After this I am getting like this:

This\n
string is very big\n
i need to split it\n
into paragraph wise.\n
But this string\n
not a formated string.

but I want like this way:

This\n
1 string is very big\n
2 i need to split it\n
3 into paragraph wise.\n
4 But this string\n
5 not a formated string

Upvotes: 3

Views: 4139

Answers (5)

Dalen
Dalen

Reputation: 4236

If it is a question of new lines only, then use the string method splitlines() with keepends=True:

>>> "This\nis\na\ntest".splitlines(True)
["This\n", "is\n", "a\n", "test"]

Otherwise you can:

def split (s, d="\n"):
    d = str(d)
    if d=="": raise ValueError, "empty separator"
    f = s.find(d)
    if f==-1: return [s]
    l = []
    li = 0 # Last index
    add = len(d)
    while f!=-1:
        l.append(s[li:f+add])
        li = f+add
        f = s.find(d, li)
    e = s[li:]
    if e: l.append(e)
    return l

Upvotes: 0

Kevin
Kevin

Reputation: 931

You can try this

import re
content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
[ i.group(0).strip() for i in re.finditer('\S\d?[^\d]+', content)]

This one stops matching the string when it reaches a digit, but digits at the beginning are allowed.

Following is the output:

['This', '1 string is very big', '2 i need to split it', '3 into paragraph wise.', '4 But this string', '5 not a formated string.']

Upvotes: 0

John Coleman
John Coleman

Reputation: 51997

You could use re.split with forward lookahead:

import re
re.split('\s(?=\d\s)',content)

resulting in:

['This', '1 string is very big', '2 i need to split it', '3 into paragraph wise.', '4 But this string', '5 not a formated string.']

This splits on spaces -- but only those which are immediately followed by a digit then another space.

Upvotes: 1

mitghi
mitghi

Reputation: 919

Use regex module provided by python. by re.sub you can find a regex group and replace it with your desired string. \g<0> is used to use the matched group ( in this case the numbers ).

Example:

import re

content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
result = re.sub(r'\s\d\s',r'\n\g<0>',content)

Result would be :

'This\n 1 string is very big\n 2 i need to split it\n 3 into paragraph wise.\n 4 But this string\n 5 not a formated string.'

Here is more in-depth details about re.sub

Upvotes: 3

xandermonkey
xandermonkey

Reputation: 4412

Why not just store the output, iterate over it, and place your delimiters back where you want them? If the delimiters need to change each time, you could use the index of the loop that you use to iterate to decide what they are/need to be.

You might find this post useful.

Upvotes: 0

Related Questions