Reputation: 23
I'm trying to split records in python using split function but unable to achieve the actual outcome.
Here is the contents of my .txt
file in below:
10000 {(10000,200,300,A),(10000,200,300,B)},{(10000,200,300,C),(10000,200,300,D)}
10001 {(10001,200,300,E),(10001,200,300,F)},{(10001,200,300,G),(10001,200,300,H)}
Here is the desired output:
10000 10000,200,300,A
10000 10000,200,300,B
10000 10000,200,300,C
10000 10000,200,300,D
10001 10001,200,300,E
10001 10001,200,300,F
10001 10001,200,300,G
10001 10001,200,300,H
Any help would be appreciated, thanks.
Upvotes: 2
Views: 306
Reputation: 5031
Here is the simplest way to get the desired result, it only requires the sub
and findall
methods from the re
package to work.
from re import sub, findall
string = """
10000 {(10000,200,300,A),(10000,200,300,B)},{(10000,200,300,C),(10000,200,300,D)}
10001 {(10001,200,300,E),(10001,200,300,F)},{(10001,200,300,G),(10001,200,300,H)}
"""
# our results go here
results = []
# loop through each line in the string
for line in string.split("\n"):
# get rid of leading and trailing whitespace
line = line.strip()
# ignore empty lines
if len(line) > 0:
# get the line's id
id = line.split("{")[0].strip()
# get all values wrapped in parenthesis
for match in findall("(\(.*?\))", string):
# add the string to the results list
results.append("{} {}".format(id, sub(r"\{|\}", "", match)))
# display the results
print(results)
Here is the same code in function form:
from re import sub, findall
def get_records(string):
# our results go here
results = []
# loop through each line in the string
for line in string.split("\n"):
# get rid of leading and trailing whitespace
line = line.strip()
# ignore empty lines
if len(line) > 0:
# get the line's id
id = line.split("{")[0].strip()
# get all values wrapped in parenthesis
for match in findall("(\(.*?\))", string):
# add the string to the results list
results.append("{} {}".format(id, sub(r"\{|\}", "", match)))
# return the results list
return results
You would then use the function, like this:
# print the results
print(get_records("""
10000 {(10000,200,300,A),(10000,200,300,B)},{(10000,200,300,C),(10000,200,300,D)}
10001 {(10001,200,300,E),(10001,200,300,F)},{(10001,200,300,G),(10001,200,300,H)}
"""))
Good luck.
Upvotes: 2