Sam
Sam

Reputation: 67

Python: Remove all lines of csv files that do not match within a range

I have been given two sets of data in the form of csv files which have 23 columns and thousands of lines of data.

The data in column 14 corresponds to the positions of stars in an image of a galaxy.

The issue is that one set of data contains values for positions that do not exist in the second set of data. They need to both contain the same positions, but the positions are off by a value of 0.0002 each data set.

F435.csv has values which are 0.0002 greater than the values in F550.csv. I am trying to find the matches between the two files, but within a certain range because all values are off by a certain amount.

Then, I need to delete all lines of data that correspond to values that do not match.

Below is a sample of the data from each of the two files:

F435W.csv:

NUMBER,FLUX_APER,FLUXERR_APER,MAG_APER,MAGERR_APER,FLUX_BEST,FLUXERR_BEST,MAG_BEST,MAGERR_BEST,BACKGROUND,X_IMAGE,Y_IMAGE,ALPHA_J2000,DELTA_J2000,X2_IMAGE,Y2_IMAGE,XY_IMAGE,A_IMAGE,B_IMAGE,THETA_IMAGE,ERRA_IMAGE,ERRB_IMAGE,ERRTHETA_IMAGE
1,2017.013,0.01242859,-8.2618,0,51434.12,0.3269918,-11.7781,0,0.01957931,1387.9406,541.916,49.9898514,41.5266996,8.81E+01,1.63E+03,1.44E+02,40.535,8.65,84.72,0.00061,0.00035,62.14
2,84.73392,0.01245409,-4.8201,0.0002,112.9723,0.04012135,-5.1324,0.0004,-0.002142646,150.306,146.7986,49.9942613,41.5444109,4.92E+00,5.60E+00,-2.02E-01,2.379,2.206,-74.69,0.00339,0.0029,88.88
3,215.1939,0.01242859,-5.8321,0.0001,262.2751,0.03840466,-6.0469,0.0002,-0.002961465,3248.686,52.8478,50.003155,41.5019044,4.77E+00,5.05E+00,-1.63E-01,2.263,2.166,-65.29,0.002,0.0019,-66.78
4,0.3796681,0.01240305,1.0515,0.0355,0.5823653,0.05487975,0.587,0.1023,-0.00425157,3760.344,11.113,50.0051049,41.4949256,1.93E+00,1.02E+00,-7.42E-02,1.393,1.007,-4.61,0.05461,0.03818,-6.68
5,0.9584663,0.01249223,0.0461,0.0142,1.043696,0.0175857,-0.0464,0.0183,-0.004156116,4013.2063,9.1225,50.0057256,41.4914444,1.12E+00,9.75E-01,1.09E-01,1.085,0.957,28.34,0.01934,0.01745,44.01

F550M.csv:

NUMBER,FLUX_APER,FLUXERR_APER,MAG_APER,MAGERR_APER,FLUX_BEST,FLUXERR_BEST,MAG_BEST,MAGERR_BEST,BACKGROUND,X_IMAGE,Y_IMAGE,ALPHA_J2000,DELTA_J2000,X2_IMAGE,Y2_IMAGE,XY_IMAGE,A_IMAGE,B_IMAGE,THETA_IMAGE,ERRA_IMAGE,ERRB_IMAGE,ERRTHETA_IMAGE,,FALSE
2,1921.566,0.01258874,-8.2091,0,37128.06,0.2618096,-11.4243,0,0.01455503,4617.5225,554.576,49.9887896,41.5264699,6.09E+01,8.09E+02,1.78E+01,28.459,7.779,88.63,0.00054,0.00036,77.04,,
3,1.055918,0.01256313,-0.0591,0.0129,9.834856,0.1109255,-2.4819,0.0122,-0.002955142,3936.4946,85.3255,49.9949149,41.5370016,3.98E+01,1.23E+01,1.54E+01,6.83,2.336,24.13,0.06362,0.01965,23.98,,
4,151.2355,0.01260153,-5.4491,0.0001,184.0693,0.03634057,-5.6625,0.0002,-0.002626019,3409.2642,76.9891,49.9931935,41.5442109,4.02E+00,4.35E+00,-1.47E-03,2.086,2.005,-89.75,0.00227,0.00198,66.61,,
5,0.3506025,0.01258874,1.138,0.039,0.3466277,0.01300407,1.1503,0.0407,-0.002441164,3351.9893,8.9147,49.9942299,41.5451727,4.97E-01,5.07E-01,7.21E-03,0.715,0.702,62.75,0.02,0.01989,82.88

Below is the code I have so far, but I'm unsure how to find matches based on that specific column. I am very new to Python, and this task is probably way beyond my knowledge of Python, but I desperately need to figure it out. I've been working on this single task for weeks, trying different methods. Thank you in advance!

import csv

with open('F435W.csv') as csvF435:
    readCSV = csv.reader(csvF435, delimiter=',')

with open('F550M.csv') as csvF550:
    readCSV = csv.reader(csvF550, delimiter=',')

for x in range (0,6348):
    a = csvF435[x]
    for y in range(0,6349):
        b = csvF550[y]

        if b < a + 0.0002 and b > a - 0.0002:
            newlist.append(b)
            break

Upvotes: 1

Views: 460

Answers (1)

Allan
Allan

Reputation: 12456

You can use the following sample:

import csv

def isfloat(value):
  try:
    float(value)
    return True
  except ValueError:
    return False

interval = 0.0002

with open('F435W.csv') as csvF435:
 csvF435_in = csv.reader(csvF435, delimiter=',')
 #clean the file content before processing
 with open("merge.csv","w") as merge_out:
   pass
 with open("merge.csv", "a") as merge_out:
  #write the header of the output csv file
  for header in csvF435_in:
    merge_out.write(','.join(header)+'\n')
    break
  for l435 in csvF435_in:
    with open('F550M.csv') as csvF550:
      csvF550_in = csv.reader(csvF550, delimiter=',')
      for l550 in csvF550_in:
        if isfloat(l435[13]) and isfloat(l550[13]) and abs(float(l435[13])-float(l550[13])) < interval:
          merge_out.write(','.join(l435)+'\n')

F435W.csv:

NUMBER,FLUX_APER,FLUXERR_APER,MAG_APER,MAGERR_APER,FLUX_BEST,FLUXERR_BEST,MAG_BEST,MAGERR_BEST,BACKGROUND,X_IMAGE,Y_IMAGE,ALPHA_J2000,DELTA_J2000,X2_IMAGE,Y2_IMAGE,XY_IMAGE,A_IMAGE,B_IMAGE,THETA_IMAGE,ERRA_IMAGE,ERRB_IMAGE,ERRTHETA_IMAGE
1,2017.013,0.01242859,-8.2618,0,51434.12,0.3269918,-11.7781,0,0.01957931,1387.9406,541.916,49.9898514,41.5266996,8.81E+01,1.63E+03,1.44E+02,40.535,8.65,84.72,0.00061,0.00035,62.14
2,84.73392,0.01245409,-4.8201,0.0002,112.9723,0.04012135,-5.1324,0.0004,-0.002142646,150.306,146.7986,49.9942613,41.5444109,4.92E+00,5.60E+00,-2.02E-01,2.379,2.206,-74.69,0.00339,0.0029,88.88
3,215.1939,0.01242859,-5.8321,0.0001,262.2751,0.03840466,-6.0469,0.0002,-0.002961465,3248.686,52.8478,50.003155,41.5019044,4.77E+00,5.05E+00,-1.63E-01,2.263,2.166,-65.29,0.002,0.0019,-66.78
4,0.3796681,0.01240305,1.0515,0.0355,0.5823653,0.05487975,0.587,0.1023,-0.00425157,3760.344,11.113,50.0051049,41.4949256,1.93E+00,1.02E+00,-7.42E-02,1.393,1.007,-4.61,0.05461,0.03818,-6.68
5,0.9584663,0.01249223,0.0461,0.0142,1.043696,0.0175857,-0.0464,0.0183,-0.004156116,4013.2063,9.1225,50.0057256,41.4914444,1.12E+00,9.75E-01,1.09E-01,1.085,0.957,28.34,0.01934,0.01745,44.01

F550M.csv:

NUMBER,FLUX_APER,FLUXERR_APER,MAG_APER,MAGERR_APER,FLUX_BEST,FLUXERR_BEST,MAG_BEST,MAGERR_BEST,BACKGROUND,X_IMAGE,Y_IMAGE,ALPHA_J2000,DELTA_J2000,X2_IMAGE,Y2_IMAGE,XY_IMAGE,A_IMAGE,B_IMAGE,THETA_IMAGE,ERRA_IMAGE,ERRB_IMAGE,ERRTHETA_IMAGE,,FALSE
2,1921.566,0.01258874,-8.2091,0,37128.06,0.2618096,-11.4243,0,0.01455503,4617.5225,554.576,49.9887896,41.5264699,6.09E+01,8.09E+02,1.78E+01,28.459,7.779,88.63,0.00054,0.00036,77.04,,
3,1.055918,0.01256313,-0.0591,0.0129,9.834856,0.1109255,-2.4819,0.0122,-0.002955142,3936.4946,85.3255,49.9949149,41.5370016,3.98E+01,1.23E+01,1.54E+01,6.83,2.336,24.13,0.06362,0.01965,23.98,,
4,151.2355,0.01260153,-5.4491,0.0001,184.0693,0.03634057,-5.6625,0.0002,-0.002626019,3409.2642,76.9891,49.9931935,41.5442109,4.02E+00,4.35E+00,-1.47E-03,2.086,2.005,-89.75,0.00227,0.00198,66.61,,
5,0.3506025,0.01258874,1.138,0.039,0.3466277,0.01300407,1.1503,0.0407,-0.002441164,3351.9893,8.9147,49.9942299,41.5451727,4.97E-01,5.07E-01,7.21E-03,0.715,0.702,62.75,0.02,0.01989,82.88

merge.csv:

NUMBER,FLUX_APER,FLUXERR_APER,MAG_APER,MAGERR_APER,FLUX_BEST,FLUXERR_BEST,MAG_BEST,MAGERR_BEST,BACKGROUND,X_IMAGE,Y_IMAGE,ALPHA_J2000,DELTA_J2000,X2_IMAGE,Y2_IMAGE,XY_IMAGE,A_IMAGE,B_IMAGE,THETA_IMAGE,ERRA_IMAGE,ERRB_IMAGE,ERRTHETA_IMAGE
2,84.73392,0.01245409,-4.8201,0.0002,112.9723,0.04012135,-5.1324,0.0004,-0.002142646,150.306,146.7986,49.9942613,41.5444109,4.92E+00,5.60E+00,-2.02E-01,2.379,2.206,-74.69,0.00339,0.0029,88.88

Upvotes: 1

Related Questions