Reputation: 27
Python
I want to split a string into parts which have at most 5000 characters. (We also need to be aware not to split it when we are in a word, and split it only if we found a space.)
I iterated through the string character by character, and every 4980 characters I split it into parts, and then if there remains a part which is less than 4980 I translate that too. I am new to python, so I'm sure my method is a mess, which works, but certainly isn't good code.
I haven't checked for any spaces in the string because in Japanese and Chinese there aren't spaces, but this would need to be checked too so we don't split a word into two parts.
with open('lightnovel.txt', 'r', encoding="utf8") as f:
file = f.read()
db = 0
partofbook = u''
last = u''
length = len(file)
mult = 0
for character in file:
db = db + 1
partofbook = partofbook + character
if db > 4880:
mult += 1
db = 0
trans(partofbook)
partofbook = u''
elif length - (mult * 4980) > 0 and length - (mult * 4980) < 5000 :
last = last + character
do = 1
if do == 1:
trans(last)
Upvotes: 1
Views: 539
Reputation: 78613
I would start at index 5000, iterate backwards till you find whitespace at position A, let's say, then your first output is string[0,A-1] (in Python, you can use s[0:A]
to get this substring).
Then jump ahead to index A+5000 and do the same thing, searching backwards for whitespace, found at index B, so your next output is string[A, B-1] (in Python you can use s[A+1:B]
to get this substring). Note: it's A+1
because you want to skip the whitespace found at index A
.
Repeat until done. Obviously check that you don't skip beyond len(string).
Also, see
Upvotes: 0
Reputation: 15
I'm also new to python so I apologise for not implementing this into your code.
there is a function called string.split()
(where string is the sentence you want to split).
this function would split only when there is a space.
Upvotes: 1