Reputation: 1049
I want to build two lists from a document that may vary in formatting but should roughly be two columns with some separator. each row is :
"word1"\t"word2"
for example. My lists should be "list_of_word1", "list_of_word2". I want to build them at once. I know that I could use pandas, but for some reason (the script should be able to work without specific import, only on general library), I also need to use regular document opening.
My attempt was:
list_of_word1=[]
list_of_word2=[]
((list_of_word1.extend(line.split()[0]),list_of_word2.extend(line.split()[1])) for line in open(doc))
The generator doesn't serve any purpose since extend returns None, so that may be seen as bad to use a form that won't be reused there or that might be unnecessary in the first place. Plus, I would like to know how to avoid to have to reuse the split function, that's "ok" for 2 times per line, but if I was to use the same principle on more columns, it would become very unefficient.
My try to avoid reuse split was to make it like this:
((list_of_word1.extend(linesplit0),list_of_word2.extend(linesplit1)) for line in open(doc) for (linesplit0,linesplit1) in line.split("\t"))
but that indeed doesn't work, since it doesn't find tuples to unpack. i also tried starred unpacking but that's not working.
((list_of_word1.extend(linesplit0),list_of_word2.extend(linesplit1)) for linesplit0,linesplit1 in open(doc).readline().split("\n").split("\t"))
But that somehow feels unsatisfactory, too contrived. What do you think?
Upvotes: 0
Views: 85
Reputation: 4882
This answer will work regardless of the delimiter used (provided it is some number of spaces!)
with open('temp.txt','r') as f:
data = f.read().strip('\n').split('\n')
dataNoSpace = [filter(lambda a: a!= '', i.split(' ')) for i in data]
list1, list2 = [list(i) for i in zip(*dataNoSpace)]
For example, if 'temp.txt' is:
word10 word20
word11 word21
word12 word22
word13 word23
word14 word24
We get:
list1
['word10', 'word11', 'word12', 'word13', 'word14']
list2
['word20', 'word21', 'word22', 'word23', 'word24']
Upvotes: 1
Reputation: 23264
You can use zip
together with argument unpacking to achieve this.
Example input file data.txt
:
1 2 3
apple orange banana
one two three
a b c
Code:
>>> with open('data.txt') as f:
... list(zip(*(line.split() for line in f)))
...
[('1', 'apple', 'one', 'a'), ('2', 'orange', 'two', 'b'), ('3', 'banana', 'three', 'c')]
See also:
Upvotes: 1
Reputation: 13356
Maybe this?
lists = [[] for i in range(<number_of_lists>)]
[[z[0] + [z[1]] for z in zip(lists, line.split())] for line in open(doc)]
(might need some tweaking)
Upvotes: 1
Reputation: 1049
Actually at first I wanted to use zip, hence the generator. But I mixed up things and ended up adding
list_of_word1=[]
list_of_word2=[]
which are useless like that. What should be done would be:
list_of_word1,list_of_word2=zip(*((line.split()) for line in open(doc)))
That works like a charm. Still the fundamental problem remains, while I could do what I wanted, I still have the problem of not knowing how to do If I have to manage a split unpacking in a comprehension. if you have any idea...?
Upvotes: 0