Sam
Sam

Reputation: 131

concatenate files in python 3

Let's say I have two files, file1.txt, file2.txt.

file1.txt is the following

TITLE   MEARA Repeatv2 Run2 
DATA TYPE       
ORIGIN  JASCO   
OWNER       
DATE    18/03/08    
TIME    22:07:45    
SPECTROMETER/DATA SYSTEM    JASCO Corp., J-715, Rev. 1.00   
RESOLUTION      
DELTAX  -0.1    
XUNITS  NANOMETERS  
YUNITS  CD[mdeg]    
    HT[V]   
FIRSTX  260 
LASTX   200 
NPOINTS 601 
FIRSTY  -4.70495    
MAXY    -4.70277    
MINY    -41.82113   
XYDATA      
260.0   -4.70495    443.669
259.9   -4.70277    443.672
259.8   -4.70929    443.674
259.7   -4.72508    443.681
259.6   -4.72720    443.69

file2.txt is this:

TITLE   MEARA Repeatv2 Run2 
DATA TYPE       
ORIGIN  JASCO   
OWNER       
DATE    18/03/08    
TIME    22:30:34    
SPECTROMETER/DATA SYSTEM    JASCO Corp., J-715, Rev. 1.00   
RESOLUTION      
DELTAX  -0.1    
XUNITS  NANOMETERS  
YUNITS  CD[mdeg]    
    HT[V]   
FIRSTX  260 
LASTX   200 
NPOINTS 601 
FIRSTY  -4.76564    
MAXY    -3.51295    
MINY    -41.95971   
XYDATA      
260 -4.76564    443.152
259.9   -4.77382    443.155
259.8   -4.78663    443.156
259.7   -4.8017 443.162
259.6   -4.83604    443.174

I have written the following Python script to concatenate the two files.

def catFiles(names, outName):
    with open(outName, 'w') as outfile:
        for fname in names:
            fileName=('/'+str(fname))
            with open(fname) as infile:
                outfile.write(infile.read())

while this script works to concatenate the two files, it stacks the files on top of each other, so that one file comes after another. I was wondering how I can modify this or rewrite it, such that the files are stacked next to each other; such that I get the following output

TITLE   MEARA Repeatv2 Run2     TITLE   MEARA Repeatv2 Run2 
DATA TYPE           DATA TYPE       
ORIGIN  JASCO       ORIGIN  JASCO   
OWNER           OWNER       
DATE    18/03/08        DATE    18/03/08    
TIME    22:07:45        TIME    22:30:34    
SPECTROMETER/DATA SYSTEM    JASCO Corp., J-715, Rev. 1.00       SPECTROMETER/DATA SYSTEM    JASCO Corp., J-715, Rev. 1.00   
RESOLUTION          RESOLUTION      
DELTAX  -0.1        DELTAX  -0.1    
XUNITS  NANOMETERS      XUNITS  NANOMETERS  
YUNITS  CD[mdeg]        YUNITS  CD[mdeg]    
    HT[V]           HT[V]   
FIRSTX  260     FIRSTX  260 
LASTX   200     LASTX   200 
NPOINTS 601     NPOINTS 601 
FIRSTY  -4.70495        FIRSTY  -4.76564    
MAXY    -4.70277        MAXY    -3.51295    
MINY    -41.82113       MINY    -41.95971   
XYDATA          XYDATA      
260.0   -4.70495    443.669 260.0   -4.76564    443.152
259.9   -4.70277    443.672 259.9   -4.77382    443.155
259.8   -4.70929    443.674 259.8   -4.78663    443.156
259.7   -4.72508    443.681 259.7   -4.80170    443.162
259.6   -4.72720    443.690 259.6   -4.83604    443.174

Upvotes: 1

Views: 162

Answers (2)

zvone
zvone

Reputation: 19362

A text file does not actually have two dimensions (width and height) as it may seem when looking at it in a text editor. It actually just has one dimension.

For example, this file:

first line
second line
third line

actually contains a string with two newline (\n) characters:

'first line\nsecond line\nthird line'

Now, let's merge that with another file which has these contents:

blue
cheese

(or: 'blue\ncheese')

The normal way, which you call vertical, simply sums the strings:

'first line\nsecond line\nthird lineblue\ncheese'

What you want is something more complex, i.e. merge each line (and probably add some spacing as well):

'first line blue\nsecond line cheese\nthird line'

Doing that directly on the level of two big strings is impossible, so you want to:

  • split each file into list of lines (e.g. ['first line', 'second line', 'third line'] and ['blue', 'cheese'])
  • merge each line of first file with corresponding line of the second file (e.g. 'first line' + ' ' + 'blue')
  • take care of excess lines, because one file may be longer (e.g. 'third line' + '')
  • merge the lines

Here is how to do that, step by step:

To read a file as lines, you can do f.read().splitlines(), but it is better to f.readlines() or just iterate over the file object (for line in f: ...)

To match corresponding lines of two files, you can use zip_longest:

for left_line, right_line in zip_longest(left_lines, right_lines):
    ...

To concatenate, with padding: '{} {}'.format(left_line, right_line)

All together, verbose:

left_lines = []
with open(left_filename, 'rt') as left_file:
    for line in left_file:
        line_without_newline = line.strip('\n')
        left_lines.append(line_without_newline)

right_lines = []
with open(right_filename, 'rt') as right_file:
    for line in right_file:
        line_without_newline = line.strip('\n')
        right_lines.append(line_without_newline)

merged_lines = []
for left_line, right_line in zip_longest(left_lines, right_lines, fillvalue=''):
    merged_lines.append('{}    {}'.format(left_line, right_line))

with open(output_filename, 'wt') as output_file:
    for merged_line in merged_lines:
        output_file.write(merged_line + '\n')

Now you can skip most of the intermediate steps to make it simpler :)

with open(left_filename, 'rt') as left_file,\
     open(right_filename, 'rt') as right_file,\
     open(output_filename, 'wt') as output_file:
    for left_line, right_line in zip_longest(left_file, right_file, fillvalue=''):
        output_file.write('{}    {}\n'.format(left_line.strip('\n'),
                                              right_line.strip('\n')))

Upvotes: 2

wim
wim

Reputation: 362857

from itertools import zip_longest

with open('file1.txt') as f1, open('file2.txt') as f2, open('out.txt', 'w') as f:
    for left, right in zip_longest(f1, f2, fillvalue='\n'):
        f.write(left.rstrip('\n') + right)

Upvotes: 3

Related Questions