Reputation: 710
I have :
s='"Tag":"Football","name":"Mickael A","Played":"10times","Tag":"Basket","name":"Bruce B","Played":"8times","Tag":"Football","name":"John R","Played":"6times",'
I want to split based on Football and Basket ie to have :
['','"Mickael A","Played":"10times"',
'"Bruce B","Played":"8times",',
'"John R","Played":"6times",']
I tried :
s.strip().split(r'"Tag":("Football"|"Basket"),"name":')
But it is not working.
Upvotes: 2
Views: 600
Reputation: 11238
the better approach will be structuring this string and I am assuming name played and game (which is repeated is related to one person). after this list of dict you can easily manipulate the data
s='"Tag":"Football","name":"Mickael A","Played":"10times","Tag":"Basket","name":"Bruce B","Played":"8times","Tag":"Football","name":"John R","Played":"6times",'
l=[]
def fun(s):
return str('{')+s+str('}')
import ast
k = s.strip().split(',')
for i in range(0,len(k),3):
dic={}
if len(k[i].split(':'))==2:
dic['Tag']=ast.literal_eval(fun(k[i]))['Tag']
dic['name']=ast.literal_eval(fun(k[i+1]))['name']
dic['Played']=ast.literal_eval(fun(k[i+2]))['Played']
l.append(dic)
print(l)
'''
output
[{'Tag': 'Football', 'name': 'Mickael A', 'Played': '10times'}, {'Tag': 'Basket', 'name': 'Bruce B', 'Played': '8times'}, {'Tag': 'Football', 'name': 'John R', 'Played': '6times'}]
'''
Upvotes: 0
Reputation: 1237
what you need is to use the re
library and to make the Football and Basketball non capturing groups so they don't appear in the result like so :
import re
re.split(r'"Tag":(?:"Football"|"Basket"),"name":', s)
the result would be :
['', '"Mickael A","Played":"10times",', '"Bruce B","Played":"8times",', '"John R","Played":"6times",']
Upvotes: 2
Reputation: 26039
Analyzing your string, it seems you need:
re.findall(r'"name":(.*?),(?:"Tag"|$)', s)
where, s
is your string. This finds all occurances of something (.*?
) followed by "name":
and preceded by ,"Tag"
or ,<end>
Full code:
import re
s = '"Tag":"Football","name":"Mickael A","Played":"10times","Tag":"Basket","name":"Bruce B","Played":"8times","Tag":"Football","name":"John R","Played":"6times",'
print(re.findall(r'"name":(.*?),(?:"Tag"|$)', s))
# ['"Mickael A","Played":"10times"', '"Bruce B","Played":"8times"', '"John R","Played":"6times"']
Upvotes: 2
Reputation: 42017
You can use the following Regex with re.split
:
"Tag":"[^"]+","name":
"Tag":"
matches literally
[^"]+
matches one or more characters that are not "
i.e. matches upto next "
","name":
matches literally
You can use non-greedy pattern .*?"
instead of [^"]+
as well:
"Tag":".*?","name":'
Example:
In [486]: s = '"Tag":"Football","name":"Mickael A","Played":"10times","Tag":"Basket","name":"Bruce B","Played":"8times","Tag":"Football","name":"John R","Played":"6times",'
In [487]: re.split(r'"Tag":"[^"]+","name":', s)
Out[487]:
['',
'"Mickael A","Played":"10times",',
'"Bruce B","Played":"8times",',
'"John R","Played":"6times",']
In [488]: re.split(r'"Tag":".*?","name":', s)
Out[488]:
['',
'"Mickael A","Played":"10times",',
'"Bruce B","Played":"8times",',
'"John R","Played":"6times",']
Upvotes: 1
Reputation: 3235
re
library does what you need.
import re
s='"Tag":"Football","name":"Mickael A","Played":"10times","Tag":"Basket","name":"Bruce B","Played":"8times","Tag":"Football","name":"John R","Played":"6times",'
re.split('Football|Basket', s)
it returns
>>> ['"Tag":"',
'","name":"Mickael A","Played":"10times","Tag":"',
'","name":"Bruce B","Played":"8times","Tag":"',
'","name":"John R","Played":"6times",']
Upvotes: 0