Python text file manipulation using dictionaries and filters

Question

I am trying to take text from two separate files such as:

File 1:
000892834     13.663      0.098      0.871      0.093      0.745      4.611       4795

File 2:
892834  4916   75   37  4857 130 128  4795  4.61 -0.09    0 0

and get an output such as:

892834     13.663      0.098      0.871      0.093      0.745      4.611       4795
892834     4916        4795       -0.09

I have some code that seems close to a solution:

filter_func_1 = lambda x: x >= 15   
filter_func_2 = lambda x: (5777 + 100) > x > (5777 - 100)
mergedData = defaultdict(list)
with open('Table1_Karoff.txt') as file_1, open('Table7_Pinsonneault.txt') as file_2, open('Processed_Data.txt', 'w') as outfile:
        for line_1 in file_1:
            splt_file_1 = line_1.split()
            if filter_func_1(splt_file_1[1]):
                 mergedData[splt_file_1[0].lstrip('0')].append(line_1)
        for line_2 in file_2:
                splt_file_2 = line_2.split()
        Data = map(itemgetter(0, 1, 8, 9), line_2)
            if filter_func_2(splt_file_2[1]):
                 mergedData[splt_file_2[0]].append(['   '.join(map(str, i)) for i in Data])
        for k in mergedData:
            if len(mergedData[k]) == 2:
                outfile.write("
".join(mergedData[k]) + "
")          
        return outfile

what this code is 'supposed' to do is create two kinds of filters using lambda operators, compare a certain index in each line to the lambda function and see if it is true, and if so, append that entire line to a list for output. It also strips the '000' away from the beginning of the first number in file 1, and checks to make sure that the same first number is present in both files.

My Problems are:

1) The file_1 ID # (that first number) does not correctly have all of the 0's stripped from it, even though to my knowledge the code should be doing that. It outputs as 00892834, thus only removing the first 0.

2) After I added the filters, no data at all would be written to the new file, and when I checked to see if the line.split had properly created a new list, it had not, meaning that there was no data to filter because there was no data in the splt_file_# input. This is strange to me and I do not understand how that could possibly happen. I tested for the list creation by adding a writeline at the end that should have written out the splt_file_1 and splt_file_2 lists, however it did not spit out anything.

3) Since the values I needed are not callable in the list from file 2 in order (I need indices 0, 1, 8, 9 only) I tried to map then format the data, but this gives an index out of range issue, which is understandable because of my problem in #2 above.

I need any help I can get in removing these errors, I do not know if my code is wrong or if Im just missing something, thanks for any help given.

Gijs · Accepted Answer

Sorry not to correct your solution, but sometimes a different take can also be helpful. This would be my code, if I understand you correctly.

file_1_data = dict()
file_2_data = dict()
for filename, data in [('infile1.txt', file_1_data), ('infile2.txt', file_2_data)]:
with open(filename) as f:
    for line in f:  
        split_line = line.split()       
        first_int = int(split_line[0])
        rest_floats = [float(f) for f in split_line[1:]]
        data[first_int] = rest_floats

Now you have dictionaries for both files, where the keys are int, so you can compare those, and the values are lists of floats. After this it's pretty easy.

def filter_1(x):
return x > 1

def filter_2(x): 
return 4 < x < 100000

with open('outfile.txt', 'wb') as outfile:
for key in file_1_data:
    if key in file_2_data:
        #write a record, the first one
        data_to_write = [str(f) for f in file_1_data[key] if filter_1(f)]       
        record = '  '.join([str(key)] + data_to_write) + '
'
        outfile.write(record)
        #second one, do filtering here
        data_to_write = [str(f) for f in file_2_data[key] if filter_2(f)]
        record = '  '.join([str(key)] + data_to_write) + '
'   
        outfile.write(record)

Hope it helps. I think my point here is: Don't worry about being a bit verbose or on the simplistic side, just make it easy for yourself and don't repeat yourself if you can avoid it. Good luck.

Python text file manipulation using dictionaries and filters

Answers (2)

Related Questions