user10835913
user10835913

Reputation:

How to split a list of list by numbers?

my_list = ['Rob Kardashian 00052369 1987-03-17 Reality Star',
'Brooke Barry 00213658 2001-03-30 TikTok Star',
'Bae De Leon 00896351 1997-08-02 Volleyball Player',
'Jonas Blue 02369785 1990-08-02 Music Producer']

I have a list of people names, IDs, DOBs, and Occupations. I want to split each person by names, ID, DOB, and their occupations.

I tried some stupid approach and but can only do part of the work, and I was wondering are there any better solutions?

Below is my code:

import re 

def remove(my_list): 
    pattern = '[0-9]'
    my_list = [re.sub(pattern, '', i) for i in my_list] 
    return my_list

print(remove(my_list))

But numbers are gone ['Rob Kardashian -- Reality Star', 'Brooke Barry -- TikTok Star', 'Bae De Leon -- Volleyball Player', 'Jonas Blue -- Music Producer']

Then, I removed the ' -- '

[s.replace(' -- ',' ') for s in remove(my_list)]

['Rob Kardashian  Reality Star','Brooke Barry  TikTok Star','Bae De Leon  Volleyball Player','Jonas Blue  Music Producer']

My expected outputs would be a dataframe:

enter image description here

pd.DataFrame(my_list)

Thanks for your help.

Upvotes: 3

Views: 62

Answers (1)

Ajax1234
Ajax1234

Reputation: 71451

You can use re.split:

import re
my_list = ['Rob Kardashian 00052369 1987-03-17 Reality Star', 'Brooke Barry 00213658 2001-03-30 TikTok Star', 'Bae De Leon 00896351 1997-08-02 Volleyball Player','Jonas Blue 02369785 1990-08-02 Music Producer']
new_l = [re.split('\s(?=\d)|(?<=\d)\s', i) for i in my_list]

Output:

[['Rob Kardashian', '00052369', '1987-03-17', 'Reality Star'], 
 ['Brooke Barry', '00213658', '2001-03-30', 'TikTok Star'], 
 ['Bae De Leon', '00896351', '1997-08-02', 'Volleyball Player'], 
 ['Jonas Blue', '02369785', '1990-08-02', 'Music Producer']]

Regex explanation:

\s(?=\d): matches any instance of a space followed by a digit.

| (alternation):attempts to match every expression to its left, or every expression to its right, stopping once it finds a valid match.

(?<=\d)\s: matches any instance of a space proceeded by a digit.

Upvotes: 3

Related Questions