user3476463
user3476463

Reputation: 4575

create list from list based on string pattern

I have a list like the example data below. Every entry in the list follows the pattern 'source/number_something/'. I would like to create a new list like the output below, where the entries are just the "something". I was thinking I could use a for loop and string split on _ but some of the texts that follow also include _. This seems like something that could be done with regex, but I'm not that good at regex. Any tips are greatly appreciated.

example data:

['source/108_cash_total/',
 'source/108_customer/',
 'source/108_daily_units_total/',
 'source/108_discounts/',
 'source/108_employee/',
'source/56_cash_total/',
 'source/56_customer/',
 'source/56_daily_units_total/',
 'source/56_discounts/',
 'source/56_employee/']

output:

['cash_total',
 'customer',
 'daily_units_total',
 'discounts',
 'employee',
'cash_total',
 'customer/',
 'daily_units_total',
 'discounts',
 'employee']

Upvotes: 2

Views: 622

Answers (3)

Jan
Jan

Reputation: 43169

You can use a regular expression:

\d+_([^/]+)

See a demo on regex101.com.


In Python:

import re

lst = ['source/108_cash_total/',
       'source/108_customer/',
       'source/108_daily_units_total/',
       'source/108_discounts/',
       'source/108_employee/',
       'source/56_cash_total/',
       'source/56_customer/',
       'source/56_daily_units_total/',
       'source/56_discounts/',
       'source/56_employee/']

rx = re.compile(r'\d+_([^/]+)')

output = [match.group(1) 
          for item in lst 
          for match in [rx.search(item)] 
          if match]
print(output)

Which yields

['cash_total', 'customer', 'daily_units_total', 
 'discounts', 'employee', 'cash_total', 'customer',
 'daily_units_total', 'discounts', 'employee']

Upvotes: 6

sahasrara62
sahasrara62

Reputation: 11228

probably not so good and clean as compare to regex

using list comprehension and split function

lst = ['source/108_cash_total/',
 'source/108_customer/',
 'source/108_daily_units_total/',
 'source/108_discounts/',
 'source/108_employee/',
'source/56_cash_total/',
 'source/56_customer/',
 'source/56_daily_units_total/',
 'source/56_discounts/',
 'source/56_employee/']

res = [ '_'.join(i.split('_')[1:]).split('/')[:-1][0]  for i in lst]

print(res)

# output ['cash_total', 'customer', 'daily_units_total', 'discounts', 'employee', 'cash_total', 'customer', 'daily_units_total', 'discounts', 'employee']

Upvotes: 0

Tupteq
Tupteq

Reputation: 3095

You can easily do this without regex using only offsets and split() with maxsplit parameter set:

offset = len("source/")
result = []
for item in lst:
    num, data = item[offset:].split("_", 1)
    result.append(data[:-1])

Of course, it's not very flexible, but as long as your data follow the schema, it doesn't matter.

Upvotes: 0

Related Questions