Ando Jurai
Ando Jurai

Reputation: 1049

splitting a list inside a comprehension to perform processing

I want to build two lists from a document that may vary in formatting but should roughly be two columns with some separator. each row is :

"word1"\t"word2"

for example. My lists should be "list_of_word1", "list_of_word2". I want to build them at once. I know that I could use pandas, but for some reason (the script should be able to work without specific import, only on general library), I also need to use regular document opening.

My attempt was:

list_of_word1=[]
list_of_word2=[]
((list_of_word1.extend(line.split()[0]),list_of_word2.extend(line.split()[1])) for line in open(doc))

The generator doesn't serve any purpose since extend returns None, so that may be seen as bad to use a form that won't be reused there or that might be unnecessary in the first place. Plus, I would like to know how to avoid to have to reuse the split function, that's "ok" for 2 times per line, but if I was to use the same principle on more columns, it would become very unefficient.

My try to avoid reuse split was to make it like this:

((list_of_word1.extend(linesplit0),list_of_word2.extend(linesplit1)) for line in open(doc) for (linesplit0,linesplit1) in line.split("\t"))

but that indeed doesn't work, since it doesn't find tuples to unpack. i also tried starred unpacking but that's not working.

((list_of_word1.extend(linesplit0),list_of_word2.extend(linesplit1)) for linesplit0,linesplit1 in open(doc).readline().split("\n").split("\t"))

But that somehow feels unsatisfactory, too contrived. What do you think?

Upvotes: 0

Views: 85

Answers (4)

Robbie
Robbie

Reputation: 4882

This answer will work regardless of the delimiter used (provided it is some number of spaces!)

with open('temp.txt','r') as f:
    data = f.read().strip('\n').split('\n')

dataNoSpace = [filter(lambda a: a!= '', i.split(' ')) for i in data]
list1, list2 = [list(i) for i in zip(*dataNoSpace)]

For example, if 'temp.txt' is:

word10 word20
word11    word21
word12       word22
word13  word23
word14    word24

We get:

list1
['word10', 'word11', 'word12', 'word13', 'word14']

list2
['word20', 'word21', 'word22', 'word23', 'word24']

Upvotes: 1

mkrieger1
mkrieger1

Reputation: 23264

You can use zip together with argument unpacking to achieve this.

Example input file data.txt:

1 2 3
apple orange banana
one two three
a b c

Code:

>>> with open('data.txt') as f:
...    list(zip(*(line.split() for line in f)))
... 
[('1', 'apple', 'one', 'a'), ('2', 'orange', 'two', 'b'), ('3', 'banana', 'three', 'c')]

See also:

Upvotes: 1

Sufian Latif
Sufian Latif

Reputation: 13356

Maybe this?

lists = [[] for i in range(<number_of_lists>)]
[[z[0] + [z[1]] for z in zip(lists, line.split())] for line in open(doc)]

(might need some tweaking)

Upvotes: 1

Ando Jurai
Ando Jurai

Reputation: 1049

Actually at first I wanted to use zip, hence the generator. But I mixed up things and ended up adding
list_of_word1=[] list_of_word2=[]

which are useless like that. What should be done would be:

list_of_word1,list_of_word2=zip(*((line.split()) for line in open(doc)))

That works like a charm. Still the fundamental problem remains, while I could do what I wanted, I still have the problem of not knowing how to do If I have to manage a split unpacking in a comprehension. if you have any idea...?

Upvotes: 0

Related Questions