Reputation: 165
I am trying to use re twice to search and split data For example:
[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"
I am finding all the substring within []
2018-07-10 15:04:11,
2018-07-10 15:04:12,
2018-07-10 15:04:42,
2018-07-10 15:04:42,
I am trying to split the space
2018-07-10,15:04:11,2018-07-10,15:04:12,2018-07-10,15:04:42,2018-07-10,15:04:42
and my code is:
import re
file = re.findall(r'\[(.*?)\]', file)
m = re.split(r'\ +', file)
but it's giving me an error- and not letting me use re twice
any suggestions would be great! Thank you in advance
Upvotes: 0
Views: 85
Reputation: 12015
>>> sum([date.split() for date in re.findall(r'\[(.*?)\]', file)], [])
['2018-07-10', '15:04:11', '2018-07-10', '15:04:12', '2018-07-10', '15:04:42', '2018-07-10', '15:04:42']
Or using itertools.chain
>>> from itertools import chain
>>> list(chain(*re.findall(r'\[(\S+) (\S+)\]', file)))
['2018-07-10', '15:04:11', '2018-07-10', '15:04:12', '2018-07-10', '15:04:42', '2018-07-10', '15:04:42']
Upvotes: 2
Reputation: 1439
Using re.findall()
and .split()
since it's not necessary to use regex twice.
import re
a = '''[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"'''
[item for sublist in [n.split() for n in re.findall(r'\[(.*?)\]',a)] for item in sublist]
['2018-07-10',
'15:04:11',
'2018-07-10',
'15:04:12',
'2018-07-10',
'15:04:42',
'2018-07-10',
'15:04:42']
Upvotes: 1
Reputation: 195543
import re
data = """[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"
"""
new_data = []
re.sub(r'\[(.*?)\].*', lambda g: new_data.extend(g[1].split()), data)
print(','.join(new_data))
Outputs:
2018-07-10,15:04:11,2018-07-10,15:04:12,2018-07-10,15:04:42,2018-07-10,15:04:42
Upvotes: 1
Reputation: 51165
Update your regex to capture each group the first time, no need for split
at all:
re.findall(r'\[(.*?)\s(.*?)\]', s)
[('2018-07-10', '15:04:11'),
('2018-07-10', '15:04:12'),
('2018-07-10', '15:04:42'),
('2018-07-10', '15:04:42')]
If you need this as a flattened list:
[elem for grp in re.findall(r'\[(.*?)\s(.*?)\]', s) for elem in grp]
['2018-07-10',
'15:04:11',
'2018-07-10',
'15:04:12',
'2018-07-10',
'15:04:42',
'2018-07-10',
'15:04:42']
Upvotes: 1
Reputation: 82785
Your file
variable has a list of elements from re.findall
Try:
import re
file = re.findall(r'\[(.*?)\]', file)
m = [re.split(r'\ +', i) for i in file]
print(m)
Output:
[['2018-07-10', '15:04:11'], ['2018-07-10', '15:04:12'], ['2018-07-10', '15:04:42'], ['2018-07-10', '15:04:42']]
Upvotes: 0