Reputation: 41
I needs to convert base string to target string. I have a working code right now, but if there is a "," character where it says tvg-name, the code is broken and doesn't work. How can I fix this bug?
Base Working String: {tvg-id: , tvg-name: A beautiful Day - 2016, tvg-logo: https://image.tmdb.org/t/p/w600_and_h900_bestv2/hZgsmIYUAtdUOUFKROq6rNyWXVa.jpg, group-title: 2017-16-15 Germany Cinema}
Base Problem String: {tvg-id: , tvg-name: Antonio, ihm schmeckt's nicht! (2016), tvg-logo: https://image.tmdb.org/t/p/w600_and_h900_bestv2/dyLfGb1mF2PUd0Rz5kqKiYtQl3r.jpg, group-title: 2017-16-15 Germany Cinema}
Target: {"tvg-id": "None", "tvg-name": "Antonio, ihm schmeckt's nicht! (2016)", "tvg-logo": "https://image.tmdb.org/t/p/w600_and_h900_bestv2/dyLfGb1mF2PUd0Rz5kqKiYtQl3r.jpg", "group-title": "2017-16-15 Germany Cinema"}
def convert(example):
#split the string into a list
example= example.replace("{", "").replace("}", "").split(",")
#create a dictionary
final = {}
#loop through the list
for i in example:
#split the string into a list
i = i.split(":")
#if http or https is in the list merge with next item
if "http" in i[1] or "https" in i[1]:
i[1] = i[1] + ":" + i[2]
i.pop(2)
#remove first char whitespace
if i[0][0] == " ":
i[0]=i[0][1:]
#remove first char whitespace
if i[1][0] == " ":
i[1]=i[1][1:]
final[i[0]] = i[1]
#return the dictionary
return final
Upvotes: 0
Views: 197
Reputation: 163362
You can check if the string starts with {
and endswith }
and then match the key value pairs.
The pattern to match the keys and the values:
([^\s:,{}]+):\s*([^,{}]*)
Explanation
([^\s:,{}]+)
Capture group 1, match 1+ chars other than a whitespace char :
,
{
}
:\s*
Match a colon followed by optional whitespace chars([^,{}]*)
Capture group 2, match optional chars other than ,
{
}
See a regex demo and a Python demo
import re
strings = [
"{tvg-id: , tvg-name: A beautiful Day - 2016, tvg-logo: https://image.tmdb.org/t/p/w600_and_h900_bestv2/hZgsmIYUAtdUOUFKROq6rNyWXVa.jpg, group-title: 2017-16-15 Germany Cinema}",
"{tvg-id: , tvg-name: Antonio, ihm schmeckt's nicht! (2016), tvg-logo: https://image.tmdb.org/t/p/w600_and_h900_bestv2/dyLfGb1mF2PUd0Rz5kqKiYtQl3r.jpg, group-title: 2017-16-15 Germany Cinema}"
]
def convert(example):
pattern = r"([^\s:,{}]+):\s*([^,{}]*)"
dct = {}
if example.endswith and example.startswith:
for t in re.findall(pattern, example):
if t[1].strip():
dct[t[0]] = t[1]
else:
dct[t[0]] = None
return dct
for s in strings:
print(convert(s))
Output
{'tvg-id': None, 'tvg-name': 'A beautiful Day - 2016', 'tvg-logo': 'https://image.tmdb.org/t/p/w600_and_h900_bestv2/hZgsmIYUAtdUOUFKROq6rNyWXVa.jpg', 'group-title': '2017-16-15 Germany Cinema'}
{'tvg-id': None, 'tvg-name': 'Antonio', 'tvg-logo': 'https://image.tmdb.org/t/p/w600_and_h900_bestv2/dyLfGb1mF2PUd0Rz5kqKiYtQl3r.jpg', 'group-title': '2017-16-15 Germany Cinema'}
Upvotes: 0
Reputation: 13242
Regex does good things:
import re
def convert(s):
s = s[1:-1] # Remove {}
# Split on commas followed by a space then group of characters that end in ':'
s = re.split(', (?=\S+:)', s)
# Split each of these groups on the first ': '. Now it's basically a dict.
return dict(i.split(': ', 1) for i in s)
>>> x = '{tvg-id: , tvg-name: A beautiful Day - 2016, tvg-logo: https://image.tmdb.org/t/p/w600_and_h900_bestv2/hZgsmIYUAtdUOUFKROq6rNyWXVa.jpg, group-title: 2017-16-15 Germany Cinema}'
>>> print(convert(x))
# Output:
{'tvg-id': '', 'tvg-name': 'A beautiful Day - 2016', 'tvg-logo': 'https://image.tmdb.org/t/p/w600_and_h900_bestv2/hZgsmIYUAtdUOUFKROq6rNyWXVa.jpg', 'group-title': '2017-16-15 Germany Cinema'}
>>> x = "{tvg-id: , tvg-name: Antonio, ihm schmeckt's nicht! (2016), tvg-logo: https://image.tmdb.org/t/p/w600_and_h900_bestv2/dyLfGb1mF2PUd0Rz5kqKiYtQl3r.jpg, group-title: 2017-16-15 Germany Cinema}"
>>> print(convert(x))
# Output:
{'tvg-id': '', 'tvg-name': "Antonio, ihm schmeckt's nicht! (2016)", 'tvg-logo': 'https://image.tmdb.org/t/p/w600_and_h900_bestv2/dyLfGb1mF2PUd0Rz5kqKiYtQl3r.jpg', 'group-title': '2017-16-15 Germany Cinema'}
Upvotes: 1
Reputation: 1264
Instead of normal .split(',')
, we can use regular expression to help us handle the split.
import re
def convert(example):
kv_pairs = re.split(', (?=\w+-?\w+:)', example[1:-1])
result = {}
for kv_pair in kv_pairs:
key, value = kv_pair.split(': ', 1)
result[key] = value
return result
In re.split(', (?=\w+-?\w+:)', example[1:-1])
, we only split those commas that are followed by the pattern (?=\w+-?\w+:)
, for example tvg-logo:
.
In key, value = kv_pair.split(': ', 1)
, we specify maxsplit=1
, so that we don't need to worry about colons in values (like URLs).
Hope it helps.
Upvotes: 2
Reputation: 9136
You can't really do this without some heuristics.
Here's a code that works -
from typing import Dict, Optional
def convert(input: str) -> Dict[str, Optional[str]]:
input = input.strip()[1:-1] # Remove the curly braces {...}
result: Dict[str, Optional[str]] = {}
carryover = ''
for pair in input.split(','):
kv = (carryover + pair).strip().split(':', 1)
if len(kv) == 1:
carryover += pair + ','
continue
result[kv[0]] = kv[1] if kv[1] else None
carryover = ''
return result
This works by preventing an output if there's no ':'
up to the current string.
Note that this will break if you have strings like '{ab,cd:ef,gh}'
since it won't know what to do with 'gh'. It's actually a bit ambiguous.
To handle all cases correctly, the only option is to change the input source to quote the string if possible. If that's not possible, or if it's a one-time thing, you can try to extend the heuristics to cover all your cases.
Upvotes: 1