Reputation: 3581
I have a string in python which is about 3900 character long. The string has multiple chars including new lines a bunch of times. For simplicity consider he following string:
s = "this is a looooooooooooooooooooooooooong string which is \n split into \n a lot of \n new lines \n and I need to split \n it into roughly \n two halves on the new line\n"
I would like to split the above string into roughly two halves on \n so expected result would be something like this :
first part = "this is a looooooooooooooooooooooooooong string which is \n split into \n a lot of "
second part = " new lines \n and I need to split \n it into roughly \n two halves on the new line\n"
I have this python code :
firstpart, secondpart = s[:len(s)/2], s[len(s)/2:]
but obviously this splits the string into exact half on whatever char happens to be at that position.
Upvotes: 3
Views: 2425
Reputation: 1238
Using str.rfind()
and str.find()
:
s = "this is\na long string\nto be split into two halves"
mid = len(s)//2
break_at = min(
s.rfind('\n', 0, mid),
s.find('\n', mid),
key=lambda i: abs(mid - i), # pick closest to middle
)
if break_at > 0:
firstpart = s[:break_at]
secondpart = s[break_at:]
else: # rfind() and find() return -1 if no '\n' found
firstpart = s
secondpart = ''
print(repr((firstpart, secondpart)))
# ('this is\na long string', '\nto be split into two halves')
secondpart
will begin with the newline character.
Upvotes: 6
Reputation: 43504
Here's another way. Split the string on '\n'
, and keep track of 3 things:
For example:
s_split = [(i, abs(len(s)//2 - s.find(x)), x) for i, x in enumerate(s.split('\n'))]
#[(0, 81, 'this is a looooooooooooooooooooooooooong string which is '),
# (1, 23, ' split into '),
# (2, 10, ' a lot of '),
# (3, 1, ' new lines '),
# (4, 13, ' and I need to split '),
# (5, 35, ' it into roughly '),
# (6, 53, ' two halves on the new line'),
# (7, 81, '')]
Now you can sort this list by the second element in the tuple to find the substring closest to the middle. Use this index to build your strings by joining using '\n'
:
idx_left = min(s_split, key=lambda x: x[1])[0]
first = "\n".join([s_split[i][2] for i in range(idx_left)])
second = "\n".join([s_split[i][2] for i in range(idx_left, len(s_split))])
print("%r"%first)
print("%r"%second)
#'this is a looooooooooooooooooooooooooong string which is \n split into \n a lot of '
#' new lines \n and I need to split \n it into roughly \n two halves on the new line\n'
Upvotes: 2
Reputation: 66
Also try this.
split=s.splitlines()
half=int(len(split)/2)
first=''.join(split[half:])
second=''.join(split[:half])
Upvotes: 0
Reputation: 15059
Try this:
mid = len(s)/2
about_mid = mid + s[mid:].index('\n')
parts = s[:about_mid], s[about_mid+1:]
Upvotes: 2