YuanL
YuanL

Reputation: 67

Best way to compare excel and text file for same data

How do I do the following in Python: for me to compare a text file and excel file to see if they contain the same data? I have the following code (bottom) setup where it looks for just the model and if it is in the text file, it prints out the model, brand and type is in there but sometimes there are multiple same models (don't ask why) so how do I output that?

I would rather have it take one line/cell from the excel sheet and compare it to each line of the text file and print out both lines if it does contain the same data but I am unsure how to go about doing that. Also, if there is a blank line in excel, how would I go about just skipping it?

I have a text file with the following

Toyota sedan Corrola
Honda sedan Accord
Honda SUV CR-V
Toyota sedan Camry
Toyota sedan Avalon
Honda SUV Camry

and an excel file with the following:

enter image description here

Code:

import pandas as pd
file1 = 'cars.xls'
file2 = open("cars.txt", "r")
df = pd.read_excel(file1, header=0)  
readfile = file2.read()

for count in range(len(df['Brand'])):
    if not str(df['Brand'][count]):
        print("Blank line" + str(count))
        print()
        continue

    model = str(df['model'][count])
    model2 = model.replace(' ', '')  # get rid of spaces in case there are any

    if model2 in readfile:
        print(str(df['Brand'][count]) + str(df['type'][count]) + str(df['model'][count]) + " is in both Excel sheet and text file")

desired output: (no particular order) Toyota sedan Corrola is in both Excel sheet and text file Honda SUV CR-V is in both Excel sheet and text file Toyota sedan Camry is in both Excel sheet and text file Honda SUV Camry is in both Excel sheet and text file Nissan sedan Altima is not in both Excel sheet and text file

etc etc

Upvotes: 0

Views: 1750

Answers (1)

Nathan Thomas
Nathan Thomas

Reputation: 270

Turn the contents of your text file into a dataframe, then check if they are equal to each other. You can adapt something like this:

df = pd.DataFrame({"Brand": ["Toyota","Honda",0,"Nissan"],
              "type": ["sedan", "SUV", 0, "sedan"],
              "model": ["Camry", "CR-V", 0, "Altima"]})


df_2 = pd.DataFrame({"Brand": ["Toyota","Honda","Honda","0"],
                  "type": ["sedan", "sedan", "SUV", "0"],
                  "model": ["Honda", "SUV", "CR-V", "0"]})

df == df_2

#output
 Brand   type  model
0   True   True  False
1   True  False  False
2  False  False  False
3  False  False  False

Upvotes: 1

Related Questions