Reputation: 117
I have the following set of strings in a txt file (archiveONE.txt), where I would like to extract what is between [||||]:
[||||]87[||||]2125[||||]1648[||||]2019-04-04 20:17:44[||||]
bla bla bla
bla bla bla
[||||]85[||||]3068[||||]1648[||||]2019-04-04 21:11:44[||||]
bla bla bla
bla bla bla
the end result should be this:
87 2125 1648 2019-04-04 20:17:44
bla bla bla
bla bla bla
85 3068 1648 2019-04-04 21:11:44
bla bla bla
bla bla bla
I tried to use the split function in python, but the allowed parameters are few or I didn't get it right:
import glob, os, re
from re import sub
fp = open("archiveONE.txt", 'r', -1)
codes= fp.readlines()
for i in codes:
print(i.split("[",4))
I also tried to use Regex expression but something didn't work:
codes = re.sub('(?<=\/[*)[\s\S]*?(?=]*\/)', '', codes)
Could someone help me find a solution?
Upvotes: 1
Views: 56
Reputation: 52
this should work for all but not the last
(?<=\]\n*)(.|\n)+?(?=\n*\[)
Upvotes: 0
Reputation: 110665
Assuming the bits to keep cannot contain the characters contained in [||||]
, I suggest you use re.findall
with the regular expression
r'[^[\]|\r\n]+'
The regex reads, "match one or more characters other than left and right brackets, pipes, carriage returns and newlines".
Upvotes: 0
Reputation: 2407
You can replace the pattern with space, then strip the 1st (and maybe the last) spaces:
codes= fp.readlines().replace("[||||]"," ").strip()
Upvotes: 0
Reputation: 475
I would suggest you simply using split
function as follows:
.split('[||||]')
So, for example
"[||||]85[||||]3068[||||]1648[||||]2019-04-04 21:11:44[||||]".split("[||||]"))
will return you:
['', '85', '3068', '1648', '2019-04-04 21:11:44', '']
So just remove the first and the last elements from the list and you are good to go!
Upvotes: 1