mrconcerned
mrconcerned

Reputation: 1945

How to delete sentence with one word in Python

I'm currently working in one dataset that contains more than 10000+ news and I want to delete the sentences that contain only one word. I have searched about nltk and textcleaner, however I wasn't able to delete the sentences that contain only one word.

For example let say: Input: I want to delete sentence with one word. Okay. Fine.Let's do it. Output: I want to delete sentence with one word. Let's do it.

The code is:

import textcleaner as tc
import nltk
import numpy as np

datafile = np.genfromtxt("f12filtered.txt", encoding='utf-8', delimiter=".")

data = tc.document(datafile)
data.remove_stpwrds() 

Upvotes: 0

Views: 1675

Answers (1)

Yoshitha Penaganti
Yoshitha Penaganti

Reputation: 464

Data can be split into a list of sentences using delimiter '.'.And then if there is only one word in a sentence, we can delete that sentence. Data would be a list now and you can join the list if you want to work with complete text or else use it as it is. You can do this using the following code:

    data = data.split('.')
    for sent in data:
        sent = sent.split(' ')
        if len(sent) < 2:
            data.remove((' ').join(sent))

To join data to form a single string:

    data = ('.').join(data)

Upvotes: 2

Related Questions