Beekash Mohanty
Beekash Mohanty

Reputation: 23

How do I split records in Python?

I'm trying to split records in python using split function but unable to achieve the actual outcome.

Here is the contents of my .txt file in below:

10000  {(10000,200,300,A),(10000,200,300,B)},{(10000,200,300,C),(10000,200,300,D)}
10001  {(10001,200,300,E),(10001,200,300,F)},{(10001,200,300,G),(10001,200,300,H)}

Here is the desired output:

10000  10000,200,300,A
10000  10000,200,300,B
10000  10000,200,300,C
10000  10000,200,300,D
10001  10001,200,300,E
10001  10001,200,300,F
10001  10001,200,300,G
10001  10001,200,300,H

Any help would be appreciated, thanks.

Upvotes: 2

Views: 306

Answers (1)

Malekai
Malekai

Reputation: 5031

Here is the simplest way to get the desired result, it only requires the sub and findall methods from the re package to work.

from re import sub, findall

string = """
  10000 {(10000,200,300,A),(10000,200,300,B)},{(10000,200,300,C),(10000,200,300,D)}
  10001 {(10001,200,300,E),(10001,200,300,F)},{(10001,200,300,G),(10001,200,300,H)}
"""

# our results go here
results = []

# loop through each line in the string
for line in string.split("\n"):
  # get rid of leading and trailing whitespace
  line = line.strip()
  # ignore empty lines
  if len(line) > 0:
    # get the line's id
    id = line.split("{")[0].strip()
    # get all values wrapped in parenthesis
    for match in findall("(\(.*?\))", string):
      # add the string to the results list
      results.append("{} {}".format(id, sub(r"\{|\}", "", match)))

# display the results
print(results)

Here is the same code in function form:

from re import sub, findall

def get_records(string):
  # our results go here
  results = []
  # loop through each line in the string
  for line in string.split("\n"):
    # get rid of leading and trailing whitespace
    line = line.strip()
    # ignore empty lines
    if len(line) > 0:
      # get the line's id
      id = line.split("{")[0].strip()
      # get all values wrapped in parenthesis
      for match in findall("(\(.*?\))", string):
        # add the string to the results list
        results.append("{} {}".format(id, sub(r"\{|\}", "", match)))
  # return the results list
  return results

You would then use the function, like this:

# print the results
print(get_records("""
  10000 {(10000,200,300,A),(10000,200,300,B)},{(10000,200,300,C),(10000,200,300,D)}
  10001 {(10001,200,300,E),(10001,200,300,F)},{(10001,200,300,G),(10001,200,300,H)}
"""))

Good luck.

Upvotes: 2

Related Questions