mdgn15
mdgn15

Reputation: 15

Adding new strings line by line from a file to a new one

I have a data output file in the format below from the script I run.

1. xxx %percentage1
2. yyy %percentage1
.
.
.

I am trying to take the percentages only, and append them to the same formatted file line by line (writing a new file once in the process).

1. xxx %percentage1 %percentage2
2. yyy %percentage1 %percentage2

The main idea is every time I run the code with a source data file I want it to add those percentages to the new file line by line.

1. xxx %percentage1 %percentage2 %percentage3 ...
2. yyy %percentage1 %percentage2 %percentage3 ...

This is what I could come up with:

import os

os.chdir("directory")

f = open("data1", "r")

n=3

a = f.readlines()
b = []

for i in range(n):
    b.append(a[i].split(" ")[2])

file_lines = []

with open("data1", 'r') as f:
    for t in range(n):
        for x in f.readlines():
            file_lines.append(''.join([x.strip(), b[t], '\n']))
            print(b[t])

with open("data2", 'w') as f:
    f.writelines(file_lines)

With this code I get the new file but the appending percentages are all from the first line, not different for each line. And I can only get one set of percentages added only and it is overwriting it rather than adding more down the lines.

I hope I explained it properly, if you can give some help I would be glad.

Upvotes: 0

Views: 49

Answers (1)

Frodon
Frodon

Reputation: 3775

You can use a dict as a structure to load and write your data. This dict can then be pickled to store the data.

EDIT: added missing return statement

EDIT2: Fix return list of get_data

import pickle
import os

output = 'output'
dump = 'dump'
output_dict = {}
if os.path.exists(dump):
    with open(dump, 'rb') as f:
        output_dict = pickle.load(f)

def read_data(lines):
    """ Builds a dict from a list of lines where the keys are
    a tuple(w1, w2) and the values are w3 where w1, w2 and w3
    are the 3 words composing each line.
    """
    d = {}
    for line in lines:
        elts = line.split()
        assert(len(elts)==3)
        d[tuple(elts[:2])] = elts[2]
    return d

def get_data(data):
    """ Recover data from a dict as a list of strings.
    The formatting for each element of the list is the following:
    k[0] k[1] v
    where k and v are the key/values of the data dict.
    """
    lines = []
    for k, v in data.items():
        line = list(k)
        line += [v, '\n'] 
        lines.append(' '.join(line))
    return lines

def update_data(output_d, new_d):
    """ Update a data dict with new data
    The values are appended if the key already exists.
    Otherwise a new key/value pair is created.
    """
    for k, v in new_d.items():
        if k in output_d:
            output_d[k] = ' '.join([output_d[k], v])
        else:
            output_d[k] = v

for data_file in ('data1', 'data2', 'data3'):
    with open(data_file) as f:
        d1 = read_data(f.readlines())
    update_data(output_dict, d1)

print("Dumping data", output_dict)
with open(dump, 'wb') as f:
    pickle.dump(output_dict, f)
print("Writing data")
with open(output, 'w') as f:
    f.write('\n'.join(get_data(output_dict)))

Upvotes: 1

Related Questions