User2939
User2939

Reputation: 165

Using re- regular expression twice

I am trying to use re twice to search and split data For example:

[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"

I am finding all the substring within []

2018-07-10 15:04:11,
2018-07-10 15:04:12,
2018-07-10 15:04:42,
2018-07-10 15:04:42,

I am trying to split the space

2018-07-10,15:04:11,2018-07-10,15:04:12,2018-07-10,15:04:42,2018-07-10,15:04:42

and my code is:

import re

file = re.findall(r'\[(.*?)\]', file)
m = re.split(r'\ +', file)

but it's giving me an error- and not letting me use re twice

any suggestions would be great! Thank you in advance

Upvotes: 0

Views: 85

Answers (5)

Sunitha
Sunitha

Reputation: 12015

>>> sum([date.split() for date in re.findall(r'\[(.*?)\]', file)], [])
['2018-07-10', '15:04:11', '2018-07-10', '15:04:12', '2018-07-10', '15:04:42', '2018-07-10', '15:04:42']

Or using itertools.chain

>>> from itertools import chain
>>> list(chain(*re.findall(r'\[(\S+) (\S+)\]', file)))
['2018-07-10', '15:04:11', '2018-07-10', '15:04:12', '2018-07-10', '15:04:42', '2018-07-10', '15:04:42']

Upvotes: 2

W Stokvis
W Stokvis

Reputation: 1439

Using re.findall() and .split() since it's not necessary to use regex twice.

import re
a = '''[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"'''


[item for sublist in [n.split() for n in re.findall(r'\[(.*?)\]',a)] for item in sublist]
['2018-07-10',
 '15:04:11',
 '2018-07-10',
 '15:04:12',
 '2018-07-10',
 '15:04:42',
 '2018-07-10',
 '15:04:42']

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195543

import re

data = """[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"
"""

new_data = []
re.sub(r'\[(.*?)\].*', lambda g: new_data.extend(g[1].split()), data)
print(','.join(new_data))

Outputs:

2018-07-10,15:04:11,2018-07-10,15:04:12,2018-07-10,15:04:42,2018-07-10,15:04:42

Upvotes: 1

user3483203
user3483203

Reputation: 51165

Update your regex to capture each group the first time, no need for split at all:

re.findall(r'\[(.*?)\s(.*?)\]', s)

[('2018-07-10', '15:04:11'),
 ('2018-07-10', '15:04:12'),
 ('2018-07-10', '15:04:42'),
 ('2018-07-10', '15:04:42')]

If you need this as a flattened list:

[elem for grp in re.findall(r'\[(.*?)\s(.*?)\]', s) for elem in grp]

['2018-07-10',
 '15:04:11',
 '2018-07-10',
 '15:04:12',
 '2018-07-10',
 '15:04:42',
 '2018-07-10',
 '15:04:42']

Upvotes: 1

Rakesh
Rakesh

Reputation: 82785

Your file variable has a list of elements from re.findall

Try:

import re

file = re.findall(r'\[(.*?)\]', file)
m = [re.split(r'\ +', i) for i in file]
print(m)

Output:

[['2018-07-10', '15:04:11'], ['2018-07-10', '15:04:12'], ['2018-07-10', '15:04:42'], ['2018-07-10', '15:04:42']]

Upvotes: 0

Related Questions