Solar
Solar

Reputation: 11

Split list of strings to list of lists including original strings in Python

This is similar to splitting a list of strings into a list of lists of strings, but I want a copy of the original string as an element of the list that came from it. The purpose is I want to parse out elements from a filename, but I want to retain the filename, so after I match the list using the words, the filename is readily available, so I can do something with it.

For example,

stringList = ["wordA1_wordA2_wordA3","wordB1_wordB2_wordB3"]

becomes

splitList = [["wordA1_wordA2_wordA3","wordA1","wordA2","wordA3"],
             ["wordB1_wordB2_wordB3","wordB1","wordB2","wordB3"]]

I'm trying to do it in a single command as a list comprehension

The closest I've gotten is:

splitList = [[item,item.split('_')] for item in stringList]

which yields:

splitList = [["wordA1_wordA2_wordA3",["wordA1","wordA2","wordA3"]],
             ["wordB1_wordB2_wordB3",["wordB1","wordB2","wordB3"]]

I could work with this, but is there a more elegant suggestion that I could learn from?

I've tried

splitList = [item.split('_') + item for item in stringList]

which complains about not concatenating a list to a str.

And

splitList = [item.split('_').append(item) for item in stringList]

which creates a list of 'None's.

Upvotes: 1

Views: 2584

Answers (2)

asikorski
asikorski

Reputation: 922

You can unpack the split list with *:

splitList=[[item,*item.split('_')] for item in stringList]

which gives you the wanted

splitList = [["wordA1_wordA2_wordA3","wordA1","wordA2","wordA3"],
           ["wordB1_wordB2_wordB3","wordB1","wordB2","wordB3"]]

You can also do something like:

splitList=[[item] + item.split('_') for item in stringList]

to deal with the concatenation of string and list. [item] simply creates a list with single item item and concatenates it with the split list.

Upvotes: 2

C.Nivs
C.Nivs

Reputation: 13106

The reason [item.split('_').append(item)...] returns None's is because list.append is an in-place modifier, and does not have a return value.

It might be a bit more advantageous to use a dict here, rather than a list of lists, since the filename can be your key, and the individual components can be your values:

stringList = ["wordA1_wordA2_wordA3","wordB1_wordB2_wordB3"]

string_dict = {filename: filename.split("_") for filename in stringList}

# {'wordA1_wordA2_wordA3': ['wordA1', 'wordA2', 'wordA3'], 'wordB1_wordB2_wordB3': ['wordB1', 'wordB2', 'wordB3']}

However, if you need a list:

processed_list = [[filename, *filename.split("_")] for filename in stringList]

# [['wordA1_wordA2_wordA3', 'wordA1', 'wordA2', 'wordA3'], ['wordB1_wordB2_wordB3', 'wordB1', 'wordB2', 'wordB3']]

Where [filename, *filename.split("_")] uses the * to unpack the resulting list from str.split into the current list

Upvotes: 1

Related Questions