kmurp62rulz
kmurp62rulz

Reputation: 259

How to read lines from a text file but exclude lines including specific words with python

I am trying to make a program that counts the amount of tweets a user has made, reading from a text file. The only problem is that I need to exclude any lines with the words "DM" or "RT".

file = open('stream.txt', 'r')
fileread = file.readlines()
tweets = [string.split() for string in fileread]

How can I change my code to make sure it excludes the lines with "DM" or "RT"?

All help is appreciated :D

Upvotes: 2

Views: 2487

Answers (4)

zar3bski
zar3bski

Reputation: 3171

Here is a concise solution (since you seem to appreciate lists by comprehension ;-)

file = open('stream.txt', 'r')
fileread = file.readlines()
goodlines = [lines for lines in fileread if lines[:2]!="DM" and lines[:2]!="RT"]
tweets = [string.split() for string in goodlines]

goodlines acts as a filter, keeping lines of fileread if the first two caracters are different from 'DM' and 'RT'. (If I understood your problem correctly)

Upvotes: 0

koPytok
koPytok

Reputation: 3713

You can simply iterate over each row in file:

tweets = list()
with open('stream.txt', 'r') as f:
    for line in f:
        if "DM" not in line and "RT" not in line:
            tweets.append(line.split())

Upvotes: 0

offeltoffel
offeltoffel

Reputation: 2801

Please always close your file after opening it. Best way to do that is by using with open(...)

The solution to your answer is putting a condition in your list comprehension:

with open('stream.txt', 'r') as file:
    fileread = file.readlines()

tweets = [string.split() for string in fileread 
          if not "DM" in string and not "RT" in string]

In case you want to exlude several strings, you can use any to save space at some point:

with open('stream.txt', 'r') as file:
    fileread = file.readlines()

exclude = ["DM", "RT"]
tweets = [string.split() for string in fileread 
          if not any(exclude[j] in string for j in range(len(exclude)))]

Upvotes: 2

Joe Iddon
Joe Iddon

Reputation: 20424

Filter out lines which contain 'DM' and 'RT' when you declare fileread:

fileread = [l for l in file.readlines() if not 'DM' in l and not 'RT' in l]

Upvotes: 1

Related Questions