Reputation: 565
all. Im using example from SO and tried to remove few lines/string in the text file but not successful. The string line that need to be deleted for example
OSPF Process 1 with Router ID 1.1.1.1
Area: 0.0.0.11
Link State Database
I able to delete those line by specifying exactly the whole string/line as below but this can only remove one line at a time and another problem is Router ID and Area could be any number and change dynamically.
filename = 'raw.txt'
with open(filename, 'r') as fin:
lines = fin.readlines()
with open('clean.txt', 'w') as fout:
for line in lines:
if 'Area: 0.0.0.10' not in line:
fout.write(line)
I tried using startwith but it doesn't remove it.
if not line.startswith('OSPF'):
This is how the looks and string placement in the text file. OSPF..., Area..., Link... lines does not start from left, it start with white space, so i think this is why startswith does not work.
OSPF Process 1 with Router ID 1.1.1.1
Area: 0.0.0.11
Link State Database
some textxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
OSPF Process 1 with Router ID 2.1.1.1
Area: 0.0.0.12
Link State Database
some textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
OSPF Process 1 with Router ID 2.2.2.2
Area: 0.0.0.33
Link State Database
some textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Expected like below after remove those lines
some textxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
some textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
some textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Please advise further and thank you
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
OSPF Process 1 with Router ID 1.1.1.1
Area: 0.0.0.11
Link State Database
for example above 5 lines when executed the script..it will remove 3 lines but still remain 2 lines
another example
* Link ID: 10.1.155.20
Data : 255.255.255.252
Link Type: StubNet
Metric : 1
Priority : Low
Area: 0.0.0.13
Link State Database
Type : Router
Ls id : 1.4.0.2
Adv rtr : 1.4.0.2
this have 4 lines (between Area and before Type)...when executing the script only 2 line remove...and 2 will remain.... For this... the final should be like below
* Link ID: 10.1.155.20
Data : 255.255.255.252
Link Type: StubNet
Metric : 1
Priority : Low
Type : Router
Ls id : 1.4.0.2
Adv rtr : 1.4.0.2
Remove specific string and line and also its next line (after Link State Database line)
clean.txt
**To remove this empty line
To remove this empty line
To remove this empty line**
Type : Router
Ls id : 1.4.0.1
Adv rtr : 1.4.0.1
Ls age : 996
Len : 48
Options : ASBR E
seq# : 8000002f
chksum : 0xe7f5
Link count: 2
* Link ID: 1.16.9.9
Data : 10.1.155.2
Link Type: P-2-P
Metric : 100
* Link ID: 10.1.155.20
Data : 255.255.255.252
Link Type: StubNet
Metric : 100
Priority : Low
Type : Router
Ls id : 1.16.9.9
Adv rtr : 1.16.9.9
Ls age : 392
Len : 48
Options : ABR E
seq# : 8000001e
chksum : 0x3116
Link count: 2
* Link ID: 1.4.0.1
Data : 10.242.177.21
Link Type: P-2-P
Metric : 1
* Link ID: 10.1.155.20
Data : 255.255.255.252
Link Type: StubNet
Metric : 1
Priority : Low
**To remove this empty line**
Type : Router
Ls id : 1.4.0.2
Adv rtr : 1.4.0.2
Ls age : 1194
Len : 96
Options : ASBR E
seq# : 8001cf7b
chksum : 0xbfae
Link count: 6
* Link ID: 1.4.0.2
Data : 255.255.255.255
Link Type: StubNet
Metric : 0
Priority : Medium
* Link ID: 1.4.0.1
Data : 10.0.0.2
Link Type: P-2-P
Metric : 10
* Link ID: 10.0.0.0
Data : 255.255.255.252
Link Type: StubNet
Metric : 10
Priority : Low
* Link ID: 10.40.8.0
Data : 255.255.255.252
Link Type: StubNet
Metric : 100
Priority : Low
* Link ID: 19.23.23.15
Data : 10.40.10.130
Link Type: P-2-P
Metric : 10
* Link ID: 1.4.10.200
Data : 255.255.255.252
Link Type: StubNet
Metric : 10
Priority : Low
To remove this empty line
Type : Router
Ls id : 100.100.0.10
Adv rtr : 100.100.0.10
Ls age : 171
Len : 84
Options : ASBR E
seq# : 8001a292
chksum : 0x5fa2
Link count: 5
* Link ID: 100.100.0.10
Data : 255.255.255.255
Link Type: StubNet
Metric : 12
Priority : Medium
* Link ID: 10.10.0.1
Data : 10.10.10.18
Link Type: P-2-P
Metric : 10
* Link ID: 10.10.10.17
Data : 255.255.255.255
Link Type: StubNet
Metric : 10
Priority : Medium
* Link ID: 19.23.23.15
Data : 10.10.30.30
Link Type: P-2-P
Metric : 10
* Link ID: 10.90.25.30
Data : 255.255.255.255
Link Type: StubNet
Metric : 10
Priority : Medium
Type : Router
Ls id : 10.10.0.1
Adv rtr : 10.10.0.1
Ls age : 191
Len : 96
Options : ASBR E
seq# : 80013bcf
chksum : 0x9871
Link count: 6
* Link ID: 10.10.0.1
Data : 255.255.255.255
Link Type: StubNet
Metric : 12
Priority : Medium
* Link ID: 15.51.51.14
Data : 10.10.0.130
Link Type: P-2-P
Metric : 10
* Link ID: 10.10.0.129
Data : 255.255.255.255
Link Type: StubNet
Metric : 10
Priority : Medium
* Link ID: 100.100.0.10
Data : 10.10.10.17
Link Type: P-2-P
Metric : 10
* Link ID: 10.10.10.18
Data : 255.255.255.255
Link Type: StubNet
Metric : 10
Priority : Medium
* Link ID: 16.16.16.0
Data : 255.255.255.252
Link Type: StubNet
Metric : 10
Priority : Low
Type : Router
Ls id : 15.51.51.14
Adv rtr : 15.51.51.14
Ls age : 2487
Len : 60
Options : ASBR ABR E
seq# : 8000003c
chksum : 0x1714
Link count: 3
* Link ID: 10.242.95.12
Data : 255.255.255.252
Link Type: StubNet
Metric : 1
Priority : Low
* Link ID: 10.10.0.1
Data : 10.10.0.129
Link Type: P-2-P
Metric : 1
* Link ID: 10.10.0.128
Data : 255.255.255.252
Link Type: StubNet
Metric : 1
Priority : Low
**To remove this empty line
To remove this empty line**
Upvotes: 1
Views: 1998
Reputation: 163217
What you might do instead of reading line by line is to read the entire content of the text file and use a pattern for that specific match taking the parts of the digits into account that can vary.
^[ \t]*OSPF Process \d+ with Router ID \d+(?:\.\d+){3}\s*Area: \d+(?:\.\d+){3}\s*Link State Database\s*(?:\n|$)
Explanation
^
Start of string[ \t]*
Match 0+ times a space or tabOSPF Process \d+ with Router ID \d+(?:\.\d+){3}
Match text taking the format of the digits \d+
for Process and Router ID into account\s*Area: \d+(?:\.\d+){3}
Match Area:
followed by 1+ digits and repeat 3 times a dot and 1+ digits\s*Link State Database
Match 0+ times a whitespace char and literal text\s*(?:\n|$)
Match 0+ times a whitespace char and then match either a newline or assert the end of the stringFor example:
import re
filename = 'raw.txt'
pattern = r"^[ \t]*OSPF Process \d+ with Router ID \d+(?:\.\d+){3}\s*Area: \d+(?:\.\d+){3}\s*Link State Database\s*(?:\n|$)"
with open(filename, 'r') as fin:
res = re.sub(pattern, "", fin.read(), 0, re.MULTILINE)
text_file = open("clean.txt", "w")
text_file.write(res)
text_file.close()
Edit
To match a empty newline after, you could use add after Database:
[ \t]*
Match 0+ times a space or string(?:
Non capturing group
(?:\r?\n|\r)[ \t]*
Match a newline followed by matching 0+ times a tab or space)?
Close non capturing group and make it optional$
Assert end of the stringFull pattern:
^[ \t]*OSPF Process \d+ with Router ID \d+(?:\.\d+){3}\s*Area: \d+(?:\.\d+){3}\s*Link State Database[ \t]*(?:(?:\r?\n|\r)[ \t]*)?$
Upvotes: 1
Reputation: 3346
You can use regular expression to find some specific text and remove it. Below is the sample code, you can play with different regex as per your requirements.
try below code:
import re
regex = "OSPF|Area|Link"
for line in lines:
if not re.findall(regex, line):
print line
Upvotes: 1
Reputation: 82899
Note that the line does not start with OSPF
, but with a bunch of spaces, and then OSPF. Try to strip
the line first. Also, startswith
can take a tuple of possible prefixes, so you can check all in one go.
for line in lines:
if not line.strip().startswith(("OSPF", "Area", "Link State")):
fout.write(line)
Note that this might fail if some of the lines in the actual text also starts with Area
or similar.
You could also use a regular expression to ensure that the line has to start with some spaces, and then one of those key words:
import re
for line in lines:
if not re.match(r"\s+(Area|OSPF|Link State)", line):
fout.write(line)
Upvotes: 1