Scooby
Scooby

Reputation: 3581

Split Python string into two on newline nearest the middle

I have a string in python which is about 3900 character long. The string has multiple chars including new lines a bunch of times. For simplicity consider he following string:

s = "this is a looooooooooooooooooooooooooong string which is \n split into \n a lot of \n new lines \n and I need to split \n it into roughly \n two halves on the new line\n"

I would like to split the above string into roughly two halves on \n so expected result would be something like this :

first part = "this is a looooooooooooooooooooooooooong string which is \n split into \n a lot of "
second part = " new lines \n and I need to split \n it into roughly \n two halves on the new line\n"

I have this python code :

firstpart, secondpart = s[:len(s)/2], s[len(s)/2:]

but obviously this splits the string into exact half on whatever char happens to be at that position.

Upvotes: 3

Views: 2425

Answers (4)

eugenhu
eugenhu

Reputation: 1238

Using str.rfind() and str.find():


s = "this is\na long string\nto be split into two halves"
mid = len(s)//2

break_at = min(
    s.rfind('\n', 0, mid),
    s.find('\n', mid),
    key=lambda i: abs(mid - i),  # pick closest to middle
)

if break_at > 0:
    firstpart = s[:break_at]
    secondpart = s[break_at:]
else:  # rfind() and find() return -1 if no '\n' found
    firstpart = s
    secondpart = ''

print(repr((firstpart, secondpart)))
# ('this is\na long string', '\nto be split into two halves')

secondpart will begin with the newline character.

Upvotes: 6

pault
pault

Reputation: 43504

Here's another way. Split the string on '\n', and keep track of 3 things:

  • The index in the split string list
  • The absolute difference between the position of the current substring and the middle of the string
  • The substring

For example:

s_split = [(i, abs(len(s)//2 - s.find(x)), x) for i, x in enumerate(s.split('\n'))]
#[(0, 81, 'this is a looooooooooooooooooooooooooong string which is '),
# (1, 23, ' split into '),
# (2, 10, ' a lot of '),
# (3, 1, ' new lines '),
# (4, 13, ' and I need to split '),
# (5, 35, ' it into roughly '),
# (6, 53, ' two halves on the new line'),
# (7, 81, '')]

Now you can sort this list by the second element in the tuple to find the substring closest to the middle. Use this index to build your strings by joining using '\n':

idx_left = min(s_split, key=lambda x: x[1])[0]
first = "\n".join([s_split[i][2] for i in range(idx_left)])
second = "\n".join([s_split[i][2] for i in range(idx_left, len(s_split))])

print("%r"%first)
print("%r"%second)
#'this is a looooooooooooooooooooooooooong string which is \n split into \n a lot of '
#' new lines \n and I need to split \n it into roughly \n two halves on the new line\n'

Upvotes: 2

Rayadurai
Rayadurai

Reputation: 66

Also try this.

split=s.splitlines()
half=int(len(split)/2)

first=''.join(split[half:])
second=''.join(split[:half])

Upvotes: 0

Arkady
Arkady

Reputation: 15059

Try this:

mid = len(s)/2
about_mid = mid + s[mid:].index('\n')

parts = s[:about_mid], s[about_mid+1:]

Upvotes: 2

Related Questions