niharika gadde
niharika gadde

Reputation: 1

Split each line in a file based on delimitters

This is the sample data in a file. I want to split each line in the file and add it to a dataframe. In some cases they have more than 1 child. So whenever they have more than one child new set of column have to be added child2 Name and DOB

(P322) Rashmika Chadda 15/05/1995 – Rashmi C 12/02/2024
(P324) Shiva Bhupati 01/01/1994 – Vinitha B 04/08/2024
(P356) Karthikeyan chandrashekar 22/02/1991 – Kanishka P 10/03/2014
(P366) Kalyani Manoj 23/01/1975 - Vandana M 15/05/1995 - Chandana M 18/11/1998 

This is the code I have tried but this splits only by taking "-" into consideration

with open("text.txt") as read_file:
    file_contents = read_file.readlines()
content_list = []
temp = []
for each_line in file_contents:
    temp = each_line.replace("–", " ").split()

    content_list.append(temp)

print(content_list)

Current output:

[['(P322)', 'Rashmika', 'Chadda', '15/05/1995', 'Rashmi', 'Chadda', 'Teega', '12/02/2024'], ['(P324)', 'Shiva', 'Bhupati', '01/01/1994', 'Vinitha', 'B', 'Sahu', '04/08/2024'], ['(P356)', 'Karthikeyan', 'chandrashekar', '22/02/1991', 'Kanishka', 'P', '10/03/2014'], ['(P366)', 'Kalyani', 'Manoj', '23/01/1975', '-', 'Vandana', 'M', '15/05/1995', '-', 'Chandana', 'M', '18/11/1998']]

Final output should be like below

Code Parent_Name DOB Child1_Name DOB Child2_Name DOB
P322 Rashmika Chadda 15/05/1995 Rashmi C 12/02/2024
P324 Shiva Bhupati 01/01/1994 Vinitha B 04/08/2024
P356 Karthikeyan chandrashekar 22/02/1991 Kanishka P 10/03/2014
P366 Kalyani Manoj 23/01/1975 Vandana M 15/05/1995 Chandana M 18/11/1998

Upvotes: 0

Views: 87

Answers (1)

Ssayan
Ssayan

Reputation: 1043

I'm not sure if you want it as a list or something else. To get lists:

result = []
for t in text[:]:

    # remove the \n at the end of each line
    t = t.strip()
    # remove the parenthesis you don't wnt
    t = t.replace("(", "")
    t = t.replace(")", "")
    # split on space
    t = t.split(" – ")
    
    # reconstruct
    for i, person in enumerate(t):
        person = person.split(" ")
        # print(person)
        # remove code
        if i==0:
            res = [person.pop(0)]
        res.extend([" ".join(person[:2]), person[2]])

    result.append(res)

print(result)

Which would give the below output:

[['P322', 'Rashmika Chadda', '15/05/1995', 'Rashmi C', '12/02/2024'], ['P324', 'Shiva Bhupati', '01/01/1994', 'Vinitha B', '04/08/2024'], ['P356', 'Karthikeyan chandrashekar', '22/02/1991', 'Kanishka P', '10/03/2014'], ['P366', 'Kalyani Manoj', '23/01/1975', 'Vandana M', '15/05/1995', 'Chandana M', '18/11/1998']]

You can organise a bit more the data using dictionnary:

result = {}
for t in text[:]:

    # remove the \n at the end of each line
    t = t.strip()
    # remove the parenthesis you don't wnt
    t = t.replace("(", "")
    t = t.replace(")", "")
    # split on space
    t = t.split(" – ")
    
    for i, person in enumerate(t):
        # split name
        person = person.split(" ")
        # remove code
        if i==0:
            code = person.pop(0)
        if i==0:
            result[code] = {"parent_name": " ".join(person[:2]), "parent_DOB": person[2], "children": [] }
        else:
            result[code]['children'].append({f"child{i}_name": " ".join(person[:2]), f"child{i}_DOB": person[2]})

print(result)

Which would give this output:

{'P322': {'children': [{'child1_DOB': '12/02/2024',
    'child1_name': 'Rashmi C'}],
  'parent_DOB': '15/05/1995',
  'parent_name': 'Rashmika Chadda'},
 'P324': {'children': [{'child1_DOB': '04/08/2024',
    'child1_name': 'Vinitha B'}],
  'parent_DOB': '01/01/1994',
  'parent_name': 'Shiva Bhupati'},
 'P356': {'children': [{'child1_DOB': '10/03/2014',
    'child1_name': 'Kanishka P'}],
  'parent_DOB': '22/02/1991',
  'parent_name': 'Karthikeyan chandrashekar'},
 'P366': {'children': [{'child1_DOB': '15/05/1995',
    'child1_name': 'Vandana M'},
   {'child2_DOB': '18/11/1998', 'child2_name': 'Chandana M'}],
  'parent_DOB': '23/01/1975',
  'parent_name': 'Kalyani Manoj'}}

In the end, to have an actual table, you would need to use pandas but that will require for you to fix the number of children max so that you can pad the empty cells.

Upvotes: 0

Related Questions