Reputation: 79
I have a text file that contains a number of lines. Some lines begin with a specific number of characters. I want to find the set of characters that contain that number and delete them and the space after. Here is an example of the list.
MARVEL COMICS
JUN130675 AGE OF ULTRON HC $75.00
JUL130663 ALL NEW X-MEN #16 $3.99
JUL130606 AVENGERS AI #3 $2.99
JUL130642 DAREDEVIL DARK NIGHTS #4
I want to find that set of 9 characters at the start of the string and delete them plus the white space. There are always 9 and always at the start. This text file contains many lines so I want to step through each line and save the output in a new text file. The name of starting file is final.txt. My language of choice is python.
Thanks
Upvotes: 0
Views: 288
Reputation: 36272
One way using a regular expression. This will find specifically the lines which you want to shorten by discarding the initial word while leaving the others unchanged.
import fileinput
import re
for line in fileinput.input():
print(re.sub(r'^\w{3}\d{6}\s+', '', line), end='')
Run it like:
python3 script.py final.txt >outfile
It yields:
MARVEL COMICS
AGE OF ULTRON HC $75.00
ALL NEW X-MEN #16 $3.99
AVENGERS AI #3 $2.99
DAREDEVIL DARK NIGHTS #4
Upvotes: 1
Reputation: 365777
What is the rule that tells you that "JUN130675" is something to skip, but "MARVEL CO" is not? If you can describe the rule in English, you can describe it in code.
For example, maybe the rule is just that "JUN130675" is nothing but letters and numbers, while "MARVEL CO" has something else in the middle (a space). Let's write that in Python:
def fix_line(line):
if line[:9].isalnum():
return line[10:].lstrip()
else:
return line
That line[:9]
gets the first 9 characters, isalnum
checks that all characters are letters or numbers. If so, line[10:]
skips the first 9 characters, and lstrip
skips the space after them.
Then we just apply that to each line:
with open('input.txt') as fin, open('output.txt', 'w') as fout:
for line in fin:
fout.write(fixline(line))
Or, if the rule is that it has to be letters and numbers, and only capital letters, and must be followed by a space… that's just three conditions joined by and
in Python, just as in English, so write it that way:
def fix_line(line):
if line[:9].isalnum() and line[:9] == line[:9].upper() and line[9:10].isspace():
# everything else is the same
Upvotes: 4