Reputation: 1891

Compare two csv files in python and skip the given row number

I am new to python

I was trying to read csv files and check any difference also to skip the second row of both files

I started something like this

  import sys
  def csv_diff(file_f,file_g):
      #file_f = sys.argv[1]
      #file_g = sys.argv[2]
      set_f = set()
      set_g = set()
      with open(file_f) as f:
          line = f.readline().strip()
          while line:
              set_f.add(line)
              line = f.readline().strip()
      with open(file_g) as g:
          line = g.readline().strip()
          while line:
              set_g.add(line)
              line = g.readline().strip()
      diff = set_f - set_g

      # print set_f
      # print set_g
      # print diff
      if diff:
          #print "Data mismatch between the files"
          return False
      else:
          #print " Data Matches "
          return True

But this code not reading the first line

My csv file

File Name : man.csv
Start Time : 2017-02-17T09:46:50
Read Count : 1
Write Count : 0
Filter Count : 0
Skip Count : 1

I am looking to skip the line: Start Time : 2017-02-17T09:46:50

Any easy and better approach?

Upvotes: 1

Answers (3)

pakpe

Reputation: 5479

For each file, you can use readlines() to read all lines, pop out index 1 and convert it to a set, then see if the sets are equal.

def csv_diff(file_f,file_g):
    with open(file_f) as f:
        textf = f.readlines()
        textf.pop(1)
        set_f = set(textf)
    with open(file_g) as g:
        textg = g.readlines()
        textg.pop(1)
        set_g = set(textg)
    if set_f == set_g:
        return True
    return False

Upvotes: 0

Adam

Reputation: 1565

You can try the following if your csv has many entries and you want to always skip Start Time. This will also work if your csv as only 1 entry as well.

def csv_diff(file_1, file_2):
    with open(file_1, "r") as f1, open(file_2, "r") as f2:
        for line1, line2 in zip(f1, f2):
            if line1.startswith("Start Time"):
                continue
            if line1.strip() != line2.strip():
                print(f"The two files '{file_1}' and '{file_2}' do not match!")
                return False
    print(f"The two files '{file_1}' and '{file_2}' are a match!")
    return True

Upvotes: 2

garchompstomp

Reputation: 130

Why not just add something simple like:

if not "Start Time" in line:
    set_g.add(line)
    line = g.readline().strip()

Upvotes: 0

Compare two csv files in python and skip the given row number

Answers (3)

Related Questions