Hawk81
Hawk81

Reputation: 137

Python - combine text files (specific lines)

I have two large text files of data from one experiment and I want split it into one in special way.

Small sample of data:

file1:

plotA   10 
plotB   9 
plotC   9

file2:

98%
7/10
21
98%
5/10
20
98%
10/10
21

And I would like result like this:

plotA   10  98% 7/10    21
plotB   9   98% 5/10    20
plotC   9   98% 10/10   21

I have no idea how it solve in python. I tried to reorder file2 with:

lines = file2.readlines()
aaa = lines[0] + lines[3] + lines[6]
bbb = lines[1] + lines[4] + lines[7]
ccc = lines[2] + lines[5] + lines[8]

and use zip but I failed (and this method is time consuming for large text files).

Any help?

Upvotes: 1

Views: 112

Answers (2)

Kasravnd
Kasravnd

Reputation: 107347

You can use itertools.izip_longest to slice file 2 to triple lines and use again use it to zip them with first file :

from itertools import izip_longest
with open('file1.txt') as f1, open('file2.txt') as f2:

     args = [iter(f2)] * 3
     z = izip_longest(f1, izip_longest(*args), fillvalue='-')
     for line, tup in z:
           print '{:11}'.format(line.strip()), '{:5}{:5}{:>5}'.format(*map(str.strip, tup))

And if you want to write this result to a new file you can open a file for write and instead of printing it write the line in file.

Result :

plotA   10  98%  7/10    21
plotB   9   98%  5/10    20
plotC   9   98%  10/10   21

Upvotes: 5

bufh
bufh

Reputation: 3420

Here is an example, you'll have to improve it with error handling and all :^)

file1 = open('file1')
file2 = open('file2')

# take one line in file1
for line in file1:
        # print result with tabulation to separate fields
        print '\t'.join(
                # the line from file1
                [line.strip()] + 
                # and three lines from file2
                [file2.readline().strip() for _ in '123']
        )       

Note that I'm using the string '123' because it is shorter than range(3) (and it does not require a function call); it just have to be an iterable of any sort generating three steps.

Reading only the required data and processing them avoid the need to load all files in memory (as you said your files are large).

Cheers.

Upvotes: 1

Related Questions