Dani Suba
Dani Suba

Reputation: 27

Split a string into pieces by length in python

Python
I want to split a string into parts which have at most 5000 characters. (We also need to be aware not to split it when we are in a word, and split it only if we found a space.)
I iterated through the string character by character, and every 4980 characters I split it into parts, and then if there remains a part which is less than 4980 I translate that too. I am new to python, so I'm sure my method is a mess, which works, but certainly isn't good code.
I haven't checked for any spaces in the string because in Japanese and Chinese there aren't spaces, but this would need to be checked too so we don't split a word into two parts.

with open('lightnovel.txt', 'r', encoding="utf8") as f:
file = f.read()

db = 0
partofbook = u''
last = u''
length = len(file)
mult = 0
for character in file:
    db = db + 1
    partofbook = partofbook + character
    if db > 4880:
        mult += 1
        db = 0
        trans(partofbook)
        partofbook = u''
    elif length - (mult * 4980) > 0 and length - (mult * 4980) < 5000 :
        last = last + character
        do = 1
if do == 1:
    trans(last)

Upvotes: 1

Views: 539

Answers (2)

jarmod
jarmod

Reputation: 78613

I would start at index 5000, iterate backwards till you find whitespace at position A, let's say, then your first output is string[0,A-1] (in Python, you can use s[0:A] to get this substring).

Then jump ahead to index A+5000 and do the same thing, searching backwards for whitespace, found at index B, so your next output is string[A, B-1] (in Python you can use s[A+1:B] to get this substring). Note: it's A+1 because you want to skip the whitespace found at index A.

Repeat until done. Obviously check that you don't skip beyond len(string).

Also, see

Upvotes: 0

MSS98
MSS98

Reputation: 15

I'm also new to python so I apologise for not implementing this into your code.

there is a function called string.split() (where string is the sentence you want to split).

this function would split only when there is a space.

Upvotes: 1

Related Questions