Reputation: 597
Background : With a Python Script, I scraping data (html) from a Website and put this data in a CSV document.
This CSV document looks like that :
Hong Kong;The Jardine Engineering Corporation Limited
Hong Kong;Towngas
Hong Kong;Tricor Services Limited
Hong Kong;UL International Limitied
Hong Kong;Urban Property Management Limited
Hong Kong;VTECH Corporate Services Ltd.
Vietnam;Cam Ranh Computer Co. Ltd
Vietnam;CFTP Company
Vietnam;Chevron Vietnam
First column : Country
Second column : Name
My file have more than 5000 rows.
I need to compare this CSV document, to another one (from the same script, so same structure) to track the potential changes (if we have new lines, or removed one). The best will be to create a file with all the changes, or print them in the terminal.
*REMEMBER that if something change in the CSV file (one more row) all the data gonna be shifted *
Upvotes: 0
Views: 1469
Reputation: 597
OLD_PATH = r'/Users/abelrossignol/Desktop/1.csv'
NEW_PATH = r'/Users/abelrossignol/Desktop/2.csv'
out = open("Out.txt", 'w')
old = open(OLD_PATH, 'r')
old_lines = list(old)
old.close()
new = open(NEW_PATH, 'r')
new_lines = list(new)
new.close()
for line in unified_diff(old_lines, new_lines, fromfile=OLD_PATH, tofile=NEW_PATH):
out.write(line)
print("Writter")
Seems to work perfectly. I'm still trying to understand the structure of Out.txt but the most difficult is done.
Thank you very much for your help ;-)
I hope that might be helpful one day for another people.
Upvotes: 0
Reputation: 12486
Use GNU diff
. It is a command-line tool designed to do exactly what you want. GUI versions are available.
From Wikipedia:
In computing,
diff
is a file comparison utility that outputs the differences between two files. It is typically used to show the changes between one version of a file and a former version of the same file. Diff displays the changes made per line for text files. Modern implementations also support binary files.[1] The output is called a "diff", or a patch, since the output can be applied with the Unix program patch. The output of similar file comparison utilities are also called a "diff"; like the use of the word "grep" for describing the act of searching, the word diff is used in jargon as a verb for calculating any difference.[citation needed]
Giving you the benefit of the doubt, you probably tried to Google for something like "Find differences between two csv files from Python". If you forget the fact the files are csv
format, or that they were created using Python, a search for find differences between text files
would have found GNU diff
for you.
Edit:
Adding one line poses no problem for GNU diff
. It will find the one line that changed, and tell you about it.
Example:
lws@helios:~$ cat file1
alpha
beta
charlie
delta
echo
foxtrot
lws@helios:~$ cat file2
alpha
beta
charlie
CHAMELEON
delta
echo
foxtrot
lws@helios:~$ diff file1 file2
3a4
> CHAMELEON
Upvotes: 1
Reputation: 174622
Welcome to StackOverflow. :)
Your problem boils down to doing a diff between two lists. This is available in Python via difflib.
This example from the manual should help you:
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
... 'ore\ntree\nemu\n'.splitlines(1))
>>> diff = list(diff) # materialize the generated delta into a list
>>> print ''.join(restore(diff, 1)),
one
two
three
>>> print ''.join(restore(diff, 2)),
ore
tree
emu
To print the changes to a file:
>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
>>> for line in unified_diff(s1, s2, fromfile='before.py', tofile='after.py'):
... sys.stdout.write(line)
--- before.py
+++ after.py
@@ -1,4 +1,4 @@
-bacon
-eggs
-ham
+python
+eggy
+hamster
guido
Upvotes: 1