Reputation: 4575
I have a list like the example data below. Every entry in the list follows the pattern 'source/number_something/'. I would like to create a new list like the output below, where the entries are just the "something". I was thinking I could use a for loop and string split on _
but some of the texts that follow also include _
. This seems like something that could be done with regex, but I'm not that good at regex. Any tips are greatly appreciated.
example data:
['source/108_cash_total/',
'source/108_customer/',
'source/108_daily_units_total/',
'source/108_discounts/',
'source/108_employee/',
'source/56_cash_total/',
'source/56_customer/',
'source/56_daily_units_total/',
'source/56_discounts/',
'source/56_employee/']
output:
['cash_total',
'customer',
'daily_units_total',
'discounts',
'employee',
'cash_total',
'customer/',
'daily_units_total',
'discounts',
'employee']
Upvotes: 2
Views: 622
Reputation: 43169
You can use a regular expression:
\d+_([^/]+)
Python
:
import re
lst = ['source/108_cash_total/',
'source/108_customer/',
'source/108_daily_units_total/',
'source/108_discounts/',
'source/108_employee/',
'source/56_cash_total/',
'source/56_customer/',
'source/56_daily_units_total/',
'source/56_discounts/',
'source/56_employee/']
rx = re.compile(r'\d+_([^/]+)')
output = [match.group(1)
for item in lst
for match in [rx.search(item)]
if match]
print(output)
Which yields
['cash_total', 'customer', 'daily_units_total',
'discounts', 'employee', 'cash_total', 'customer',
'daily_units_total', 'discounts', 'employee']
Upvotes: 6
Reputation: 11228
probably not so good and clean as compare to regex
using list comprehension
and split function
lst = ['source/108_cash_total/',
'source/108_customer/',
'source/108_daily_units_total/',
'source/108_discounts/',
'source/108_employee/',
'source/56_cash_total/',
'source/56_customer/',
'source/56_daily_units_total/',
'source/56_discounts/',
'source/56_employee/']
res = [ '_'.join(i.split('_')[1:]).split('/')[:-1][0] for i in lst]
print(res)
# output ['cash_total', 'customer', 'daily_units_total', 'discounts', 'employee', 'cash_total', 'customer', 'daily_units_total', 'discounts', 'employee']
Upvotes: 0
Reputation: 3095
You can easily do this without regex using only offsets and split()
with maxsplit
parameter set:
offset = len("source/")
result = []
for item in lst:
num, data = item[offset:].split("_", 1)
result.append(data[:-1])
Of course, it's not very flexible, but as long as your data follow the schema, it doesn't matter.
Upvotes: 0