Reputation: 313

How can remove all member of the list in python

I am reading from file x which is contained individual data. These data are separated from each other by new line.I want to calculate tf_idf_vectorizer() for each individual data. So, I need to remove all members of the tweets whenever the code fine new line (\n) . I got error for the bold line in my code.

def load_text():
    file=open('x.txt', 'r')
    tweets = []
    all_matrix = []

    for line in file:
        if line in ['\n', '\r\n']:
            all_matrix.append(tf_idf_vectorizer(tweets))
            **for i in tweets: tweets.remove(i)** 
        else:
            tweets.append(line)

    file.close()

    return all_matrix

Upvotes: 1

Answers (4)

abarnert

Reputation: 365707

If you actually need to empty out the list in-place, the way you do it is either:

del tweets[:]

… or …

tweets[:] = []

In general, you can delete or replace any subslice of a list in this way; [:] is just the subslice that means "the whole list".

However, since nobody else has a reference to tweets, there's really no reason to empty out the list; just create a new empty list, and bind tweets to that, and let the old list become garbage to be cleaned up:

tweets = []

Anyway, there are two big problems with this:

for i in tweets: tweets.remove(i)

First, when you want to remove a specific element, you should never use remove. That has to search the list to find a matching element—which is wasteful (since you already know which one you wanted), and also incorrect if you have any duplicates (there could be multiple matches for the same element). Instead, use the index. For example, del tweets[index]. You can use the enumerate function to get the indices. The same thing is true for lots of other list, string, etc. functions—don't use index, find, etc. with a value when you could get the index directly.

Second, if you remove the first element, everything else shifts up by one. So, first you remove element #0. Then, when you remove element #1, that's not the original element #1, but the original #2, which has shifted up one space. And besides skipping every other element, once you're half-way through, you're trying to remove elements past the (new) end of the list. In general, avoid mutating a list while iterating over it; if you must mutate it, it's only safe to do so from the right, not the left (and it's still tricky to get right).

The right way to remove elements one by one from the left is:

while tweets:
    del tweets[0]

However, this will be pretty slow, because you keep having to re-adjust the list after each removal. So it's still better to go from the right:

while tweets:
    del tweets[-1]

But again, there's no need to go one by one when you can just do the whole thing at once, or not even do it, as explained above.

Upvotes: 3

Jon Clements

Reputation: 142146

You could also re-work the code to be:

from itertools import groupby

def load_tweet(filename):
    with open(filename) as fin:
        tweet_blocks = (g for k, g in groupby(fin, lambda line: bool(line.strip())) if k)
        return [tf_idf_vectorizer(list(tweets)) for tweets in tweet_blocks]

This groups the file into runs of non-blank lines and blank lines. Where the lines aren't blank, we build a list from them to pass to the vectorizer inside a list-comp. This means that we're not having references to lists hanging about, nor are we appending one-at-a-time to lists.

Upvotes: 0