Adil kasbaoui
Adil kasbaoui

Reputation: 663

Get N characters from string, respecting full words python

I am using this code to get the first 4000 characters of a long text.

text = data[0:4000]
print(text)

data is the variable containing the long text, now the problem is when I print text, at the end, I get half a word, for example "con" while the word should be "content".

I am wondering if there is a way to ensure the words aren't truncated.

Upvotes: 1

Views: 696

Answers (2)

Steve Shay
Steve Shay

Reputation: 116

A simple find statement that looks for a space beginning at character 4000 gets this started

x = txt.find(' ',4000)

But to avoid truncating the last word then you need to test the results of your find statement.

If the starting point of 4000 is within the last word then it will return a -1 and you'll print/return the entire text.

If the starting point is before the last word then it will return the index of the next space and you'll print up to that index

x = txt.find(' ',4000)
if x < 0:
    print (txt)
else:
    print (txt[:x])

Also remember that the starting point on find is zero based so if the 4000th character is a space it will find the next space. As a simple example, the following code will return "four five" rather than simply "four". If this is not the desired result then consider using 3999 in your find.

txt = "four five six"
x = txt.find(' ',5)
print(txt[:x])
# returns "four five"

Upvotes: 2

James
James

Reputation: 36623

Find the first space after 4000 characters. You can use max to account for text that ends a few characters past 4000, but with no space at the end.

ix = max(data.find(' ', 4000), 4000)
text = data[:ix]

Upvotes: 3

Related Questions