Reputation: 329
I am trying to take text from two separate files such as:
File 1:
000892834 13.663 0.098 0.871 0.093 0.745 4.611 4795
File 2:
892834 4916 75 37 4857 130 128 4795 4.61 -0.09 0 0
and get an output such as:
892834 13.663 0.098 0.871 0.093 0.745 4.611 4795
892834 4916 4795 -0.09
I have some code that seems close to a solution:
filter_func_1 = lambda x: x >= 15
filter_func_2 = lambda x: (5777 + 100) > x > (5777 - 100)
mergedData = defaultdict(list)
with open('Table1_Karoff.txt') as file_1, open('Table7_Pinsonneault.txt') as file_2, open('Processed_Data.txt', 'w') as outfile:
for line_1 in file_1:
splt_file_1 = line_1.split()
if filter_func_1(splt_file_1[1]):
mergedData[splt_file_1[0].lstrip('0')].append(line_1)
for line_2 in file_2:
splt_file_2 = line_2.split()
Data = map(itemgetter(0, 1, 8, 9), line_2)
if filter_func_2(splt_file_2[1]):
mergedData[splt_file_2[0]].append([' '.join(map(str, i)) for i in Data])
for k in mergedData:
if len(mergedData[k]) == 2:
outfile.write("\n".join(mergedData[k]) + "\n")
return outfile
what this code is 'supposed' to do is create two kinds of filters using lambda operators, compare a certain index in each line to the lambda function and see if it is true, and if so, append that entire line to a list for output. It also strips the '000' away from the beginning of the first number in file 1, and checks to make sure that the same first number is present in both files.
My Problems are:
1) The file_1 ID # (that first number) does not correctly have all of the 0's stripped from it, even though to my knowledge the code should be doing that. It outputs as 00892834, thus only removing the first 0.
2) After I added the filters, no data at all would be written to the new file, and when I checked to see if the line.split had properly created a new list, it had not, meaning that there was no data to filter because there was no data in the splt_file_# input. This is strange to me and I do not understand how that could possibly happen. I tested for the list creation by adding a writeline at the end that should have written out the splt_file_1 and splt_file_2 lists, however it did not spit out anything.
3) Since the values I needed are not callable in the list from file 2 in order (I need indices 0, 1, 8, 9 only) I tried to map then format the data, but this gives an index out of range issue, which is understandable because of my problem in #2 above.
I need any help I can get in removing these errors, I do not know if my code is wrong or if Im just missing something, thanks for any help given.
Upvotes: 0
Views: 601
Reputation: 2826
You're passing strings to filter_func_1
and filter_func_2
and then comparing them with integers inside the lambdas. But when you compare numbers and strings, the comparison is degenerate: numerics are always considered to precede strings (this is implementation-specific; I'm assuming CPython behavior). So your first lambda is always going to return True
and the second False
. As a result, they're not functioning as filters in your code.
You need to convert the strings you pass to integers or floats, e.g.:
filter_func_1 = lambda x: float(x) >= 15
Or you could convert your input before before passing it to a filter. In either case you should think about what you want to do when the input can't be converted to a numeric type.
Converting to a numeric type will also get rid of leading 0
s. It might or might not help with your second problem, but in any case you won't get the results you expect until you make a change to this part of your code.
Upvotes: 1
Reputation: 10881
Sorry not to correct your solution, but sometimes a different take can also be helpful. This would be my code, if I understand you correctly.
file_1_data = dict()
file_2_data = dict()
for filename, data in [('infile1.txt', file_1_data), ('infile2.txt', file_2_data)]:
with open(filename) as f:
for line in f:
split_line = line.split()
first_int = int(split_line[0])
rest_floats = [float(f) for f in split_line[1:]]
data[first_int] = rest_floats
Now you have dictionaries for both files, where the keys are int
, so you can compare those, and the values are lists of floats. After this it's pretty easy.
def filter_1(x):
return x > 1
def filter_2(x):
return 4 < x < 100000
with open('outfile.txt', 'wb') as outfile:
for key in file_1_data:
if key in file_2_data:
#write a record, the first one
data_to_write = [str(f) for f in file_1_data[key] if filter_1(f)]
record = ' '.join([str(key)] + data_to_write) + '\n'
outfile.write(record)
#second one, do filtering here
data_to_write = [str(f) for f in file_2_data[key] if filter_2(f)]
record = ' '.join([str(key)] + data_to_write) + '\n'
outfile.write(record)
Hope it helps. I think my point here is: Don't worry about being a bit verbose or on the simplistic side, just make it easy for yourself and don't repeat yourself if you can avoid it. Good luck.
Upvotes: 1