Robert Lime
Robert Lime

Reputation: 35

Split a text by word length in Python

I have a text as below:-

text = "I have an Apple. I have a Banana. I have an Orange. I have a Watermelon."

I want to split it into a new pandas dataframe for every 5th word as below:-

id Text
0 I have an Apple. I
1 have a Banana. I have
2 an Orange I have a
3 Watermelon

Any help is much appreciated!

Upvotes: 0

Views: 244

Answers (2)

Ynjxsjmh
Ynjxsjmh

Reputation: 30042

You can try groupby then aggretate

import pandas as pd

text = "I have an Apple. I have a Banana. I have an Orange. I have a Watermelon."

df = pd.DataFrame({'Text': text.split()})

out = df.groupby(df.index//5).agg({'Text': ' '.join})
print(out)

                    Text
0     I have an Apple. I
1  have a Banana. I have
2    an Orange. I have a
3            Watermelon.

Upvotes: 2

Aravind G.
Aravind G.

Reputation: 431

This would work:

text = "I have an Apple. I have a Banana. I have an Orange. I have a Watermelon."
words = text.split()
sentences = []
for i in range(0, len(words), 5):
    sentence = words[i:i+5]
    sentence = ' '.join(sentence)
    sentences.append(sentence)
series = pd.Series(sentences)
df = series.to_frame()
df.columns = ['Text']

Then the resulting dataframe df would look like this, which is what you have specified in your question:

                    Text
0     I have an Apple. I
1  have a Banana. I have
2    an Orange. I have a
3            Watermelon.

Upvotes: 1

Related Questions