Reputation: 173
Code:
with open("filename.txt" 'r') as f: #I'm not sure about reading it as r because I would be removing lines.
lines = f.readlines() #stores each line in the txt into 'lines'.
invalid_line_count = 0
for line in lines: #this iterates through each line of the txt file.
if line is invalid:
# something which removes the invalid lines.
invalid_line_count += 1
print("There were " + invalid_line_count + " amount of invalid lines.")
I have a text file like so:
1,2,3,0,0
2,3,0,1,0
0,0,0,1,2
1,0,3,0,0
3,2,1,0,0
The valid line structure is 5 values split by commas. For a line to be valid, it must have a 1, 2, 3 and two 0's. It doesn't matter in what position these numbers are.
An example of a valid line is 1,2,3,0,0
An example of an invalid line is 1,0,3,0,0
, as it does not contain a 2 and has 3 0's instead of 2.
I would like to be able to iterate through the text file and remove invalid lines. and maybe a little message saying "There were x amount of invalid lines."
Or maybe as suggested:
As you read each line from the original file, test it for validity. If it passes, write it out to the new file. When you're finished, rename the original file to something else, then rename the new file to the original file.
I think that the csv module may help so I read the documentation and it doesn't help me.
Any ideas?
Upvotes: 0
Views: 63
Reputation: 777
For a line to be valid, each line must have a 1, 2, 3 and 2 0's. It doesn't matter in what position these numbers are.
CHUNK_SIZE = 65536
def _is_valid(line):
"""Check if a line is valid.
A line is valid if it is of length 5 and contains '1', '2', '3',
in any order, as well as '0', twice.
:param list line: The line to check.
:return: True if the line is valid, else False.
:rtype: bool
"""
if len(line) != 5:
# If there's not exactly five elements in the line, return false
return False
if all(x in line for x in {"1", "2", "3"}) and line.count("0") == 2:
# Builtin `all` checks if a condition (in this case `x in line`)
# applies to all elements of a certain iterator.
# `list.count` returns the amount of times a specific
# element appears in it. If "0" appears exactly twice in the line
# and the `all` call returns True, the line is valid.
return True
# If the previous block doesn't execute, the line isn't valid.
return False
def get_valid_lines(path):
"""Get the valid lines from a file.
The valid lines will be written to `path`.
:param str path: The path to the file.
:return: None
:rtype: None
"""
invalid_lines = 0
contents = []
valid_lines = []
with open(path, "r") as f:
# Open the `path` parameter in reading mode.
while True:
chunk = f.read(CHUNK_SIZE)
# Read `CHUNK_SIZE` bytes (65536) from the file.
if not chunk:
# Reaching the end of the file, we get an EOF.
break
contents.append(chunk)
# If the chunk is not empty, add it to the contents.
contents = "".join(contents).split("\n")
# `contents` will be split in chunks of size 65536. We need to join
# them using `str.join`. We then split all of this by newlines, to get
# each individual line.
for line in contents:
if not _is_valid(line=line):
invalid_lines += 1
else:
valid_lines.append(line)
print("Found {} invalid lines".format(invalid_lines))
with open(path, "w") as f:
for line in valid_lines:
f.write(line)
f.write("\n")
I'm splitting this up into two functions, one to check if a line is valid according to your rules, and a second one to manipulate a file. If you want to return the valid lines instead, just remove the second with
statement and replace it with return valid_lines
.
Upvotes: 0
Reputation: 92854
One of the possible right approaches:
with open('filename.txt', 'r+') as f: # opening file in read/write mode
inv_lines_cnt = 0
valid_list = [0, 0, 1, 2, 3] # sorted list of valid values
lines = f.read().splitlines()
f.seek(0)
f.truncate(0) # truncating the initial file
for l in lines:
if sorted(map(int, l.split(','))) == valid_list:
f.write(l+'\n')
else:
inv_lines_cnt += 1
print("There were {} amount of invalid lines.".format(inv_lines_cnt))
The output:
There were 2 amount of invalid lines.
The final filename.txt
contents:
1,2,3,0,0
2,3,0,1,0
3,2,1,0,0
Upvotes: 1
Reputation: 77827
Your main problem with detecting line validity seems to be handling the input. You want to convert the input text to a list of values; this is a skill you should get from learning your tools. The ones you need here are split
to divide the line, and int
to convert the values. For instance:
line_vals = line.split(',')
Now iterate through line_vals
, and convert each to integer with int
.
collections.Counter
, which is a convenient type of dictionary that accumulates counts from any sequence.Does that get you moving? If you're still lost, I recommend some time with a local tutor.
Upvotes: 2
Reputation: 5354
This is a mostly language-independent problem. What you would do is open another file for writing. As you read each line from the original file, test it for validity. If it passes, write it out to the new file. When you're finished, rename the original file to something else, then rename the new file to the original file.
Upvotes: 1