Fazli
Fazli

Reputation: 171

Splitting list elements based on substring

How do I split the elements in this element based on the string before the dot without explicitly writing it in code?

lst = ['ds_a.cola','ds_a.colb','ds_b.cola','ds_b.colb']

Since there are two variants of 'ds'. I want two lists.

lst_dsa = ['ds_a.cola','ds_a.colb']
lst_dsb = ['ds_b.cola','ds_b.colb']

My old code was:

lst_dsa = []
lst_dsb = []
for item in lst :
    if "ds_a" in item:
        lst_dsa.append(item)
    else:
        lst_dsb.append(item)

But I can't use this since there might be more than 2, like, ds_c,ds_d.... How do I achieve this in python?

Upvotes: 3

Views: 875

Answers (5)

use two repeating regular expressions: one for ds_a period and one or more words and ds period and one or more words. Ignore the empty group and use a defaultdict to add values to the set.

 lst = ['ds_a.cola','ds_a.colb','ds_b.cola','ds_b.colb']
 pattern=r"(?:\bds_a\.\w+\b\s*)*(?:\bds_b\.\w+\b\s*)*"

string=" ".join(lst)

 groups=re.findall(pattern,string)
 dict=defaultdict(set)
 for group in groups:
     for item in group.split():
         if item !="":
             print(item)
             key,*value=item.split('.')
             dict[key].add(value[0])

print(dict)

output:

 defaultdict(<class 'set'>, {'ds_a': {'cola', 'colb'}, 'ds_b': {'cola', 'colb'}})

Upvotes: 0

Shireen
Shireen

Reputation: 177

try this:

 d = dict()
 for item in lst:
     key = item.split(".")[0]
     if key not in d.keys():
         d[key] = list()
     d[key].append(item)

Upvotes: 0

balderman
balderman

Reputation: 23815

Use a dict and hold the data

from collections import defaultdict
lst = ['ds_a.cola','ds_a.colb','ds_b.cola','ds_b.colb','ds_x.cola','ds_x.colb']
data = defaultdict(list)
for entry in lst:
  a,_ = entry.split('.')
  data[a].append(entry)
print(data)

output

defaultdict(<class 'list'>, {'ds_a': ['ds_a.cola', 'ds_a.colb'], 'ds_b': ['ds_b.cola', 'ds_b.colb'], 'ds_x': ['ds_x.cola', 'ds_x.colb']})

Upvotes: 3

David Meu
David Meu

Reputation: 1545

You can map them:

from collections import defaultdict

lst = ['ds_a.cola','ds_a.colb','ds_b.cola','ds_b.colb']
ds_dict = defaultdict(list)

for item in lst:
    key, value = item.split(".")
    ds_dict[key].append(value)

print(dict(ds_dict))

Output:

{'ds_a': ['cola', 'colb'], 'ds_b': ['cola', 'colb']}

Upvotes: 0

U13-Forward
U13-Forward

Reputation: 71580

Try itertools.groupby:

>>> from itertools import groupby
>>> [list(v) for _, v in groupby(lst, key=lambda x: x[x.find('_') + 1])]
[['ds_a.cola', 'ds_a.colb'], ['ds_b.cola', 'ds_b.colb']]
>>> 

Upvotes: 2

Related Questions