Arli94
Arli94

Reputation: 710

Python : string split with either A or B

I have :

 s='"Tag":"Football","name":"Mickael A","Played":"10times","Tag":"Basket","name":"Bruce B","Played":"8times","Tag":"Football","name":"John R","Played":"6times",'

I want to split based on Football and Basket ie to have :

['','"Mickael A","Played":"10times"',
'"Bruce B","Played":"8times",',
'"John R","Played":"6times",']

I tried :

s.strip().split(r'"Tag":("Football"|"Basket"),"name":')

But it is not working.

Upvotes: 2

Views: 600

Answers (5)

sahasrara62
sahasrara62

Reputation: 11238

the better approach will be structuring this string and I am assuming name played and game (which is repeated is related to one person). after this list of dict you can easily manipulate the data

s='"Tag":"Football","name":"Mickael A","Played":"10times","Tag":"Basket","name":"Bruce B","Played":"8times","Tag":"Football","name":"John R","Played":"6times",'


l=[]
def fun(s):
 return str('{')+s+str('}')
import ast


k = s.strip().split(',')

for i in range(0,len(k),3):
    dic={}
    if len(k[i].split(':'))==2:
        dic['Tag']=ast.literal_eval(fun(k[i]))['Tag']
        dic['name']=ast.literal_eval(fun(k[i+1]))['name']
        dic['Played']=ast.literal_eval(fun(k[i+2]))['Played']
        l.append(dic)
print(l)
'''
output

[{'Tag': 'Football', 'name': 'Mickael A', 'Played': '10times'}, {'Tag': 'Basket', 'name': 'Bruce B', 'Played': '8times'}, {'Tag': 'Football', 'name': 'John R', 'Played': '6times'}]

'''

Upvotes: 0

painor
painor

Reputation: 1237

what you need is to use the re library and to make the Football and Basketball non capturing groups so they don't appear in the result like so :

import re
re.split(r'"Tag":(?:"Football"|"Basket"),"name":', s)

the result would be :

['', '"Mickael A","Played":"10times",', '"Bruce B","Played":"8times",', '"John R","Played":"6times",']

Upvotes: 2

Austin
Austin

Reputation: 26039

Analyzing your string, it seems you need:

re.findall(r'"name":(.*?),(?:"Tag"|$)', s)

where, s is your string. This finds all occurances of something (.*?) followed by "name": and preceded by ,"Tag" or ,<end>

Full code:

import re

s = '"Tag":"Football","name":"Mickael A","Played":"10times","Tag":"Basket","name":"Bruce B","Played":"8times","Tag":"Football","name":"John R","Played":"6times",'

print(re.findall(r'"name":(.*?),(?:"Tag"|$)', s))
# ['"Mickael A","Played":"10times"', '"Bruce B","Played":"8times"', '"John R","Played":"6times"']

Upvotes: 2

heemayl
heemayl

Reputation: 42017

You can use the following Regex with re.split:

"Tag":"[^"]+","name":
  • "Tag":" matches literally

  • [^"]+ matches one or more characters that are not " i.e. matches upto next "

  • ","name": matches literally

You can use non-greedy pattern .*?" instead of [^"]+ as well:

"Tag":".*?","name":'

Example:

In [486]: s = '"Tag":"Football","name":"Mickael A","Played":"10times","Tag":"Basket","name":"Bruce B","Played":"8times","Tag":"Football","name":"John R","Played":"6times",'

In [487]: re.split(r'"Tag":"[^"]+","name":', s)
Out[487]: 
['',
 '"Mickael A","Played":"10times",',
 '"Bruce B","Played":"8times",',
 '"John R","Played":"6times",']

In [488]: re.split(r'"Tag":".*?","name":', s)
Out[488]: 
['',
 '"Mickael A","Played":"10times",',
 '"Bruce B","Played":"8times",',
 '"John R","Played":"6times",']

Upvotes: 1

Amir Imani
Amir Imani

Reputation: 3235

re library does what you need.

import re

s='"Tag":"Football","name":"Mickael A","Played":"10times","Tag":"Basket","name":"Bruce B","Played":"8times","Tag":"Football","name":"John R","Played":"6times",'
re.split('Football|Basket', s)

it returns

>>> ['"Tag":"',
     '","name":"Mickael A","Played":"10times","Tag":"',
     '","name":"Bruce B","Played":"8times","Tag":"',
     '","name":"John R","Played":"6times",'] 

Upvotes: 0

Related Questions