Reputation: 259
I am trying to make a program that counts the amount of tweets a user has made, reading from a text file. The only problem is that I need to exclude any lines with the words "DM" or "RT".
file = open('stream.txt', 'r')
fileread = file.readlines()
tweets = [string.split() for string in fileread]
How can I change my code to make sure it excludes the lines with "DM" or "RT"?
All help is appreciated :D
Upvotes: 2
Views: 2487
Reputation: 3171
Here is a concise solution (since you seem to appreciate lists by comprehension ;-)
file = open('stream.txt', 'r')
fileread = file.readlines()
goodlines = [lines for lines in fileread if lines[:2]!="DM" and lines[:2]!="RT"]
tweets = [string.split() for string in goodlines]
goodlines acts as a filter, keeping lines of fileread if the first two caracters are different from 'DM' and 'RT'. (If I understood your problem correctly)
Upvotes: 0
Reputation: 3713
You can simply iterate over each row in file:
tweets = list()
with open('stream.txt', 'r') as f:
for line in f:
if "DM" not in line and "RT" not in line:
tweets.append(line.split())
Upvotes: 0
Reputation: 2801
Please always close your file after opening it. Best way to do that is by using with open(...)
The solution to your answer is putting a condition in your list comprehension:
with open('stream.txt', 'r') as file:
fileread = file.readlines()
tweets = [string.split() for string in fileread
if not "DM" in string and not "RT" in string]
In case you want to exlude several strings, you can use any
to save space at some point:
with open('stream.txt', 'r') as file:
fileread = file.readlines()
exclude = ["DM", "RT"]
tweets = [string.split() for string in fileread
if not any(exclude[j] in string for j in range(len(exclude)))]
Upvotes: 2
Reputation: 20424
Filter out lines which contain 'DM'
and 'RT'
when you declare fileread
:
fileread = [l for l in file.readlines() if not 'DM' in l and not 'RT' in l]
Upvotes: 1