Reputation: 35
I am new to Python and trying to import 2 csv files using csv.reader
then comparing to see if elements from one are present in the other, and if so deleting that entire row.
I have found other questions to similar problems that suggest list comprehension is the way to go but when I do the loop to check if the appList
exists in machine
the result I get is empty brackets like thus [].
My code so far is:
import csv
appList = csv.reader(open('applist.csv', encoding = "ISO-8859-1"))
appList = list(appList)
machine = csv.reader(open('machine.csv', encoding = "ISO-8859-1"))
machine = list(machine)
for app in appList:
machine = [app for app in machine if app not in machine]
print(machine)
The applist.csv looks like this (its a list of apps on a macOS standard build)
Adobe Creative Cloud for Enterprise
Adobe Acrobat DC Professional
Adobe Bridge CC
Adobe Extension Manager CC
Adobe Illustrator CC 2015
Adobe InDesign CC 2015
Adobe Photoshop CC 2015
Adobe Media Encoder CC 2015
AirPort Utility 6
App Store
Automator 2
[...]
The machine.csv looks like this...
"Application name";"Metric";"Last used";"Requirement";"Entitlement state";"Remark"
"Adobe Creative Cloud for Enterprise (Mac)";"Installations";"2018-03-28T10:45:00+01:00";"1";"Not covered";""
"Adobe Acrobat DC Professional (Mac)";"Installations";"2018-03-22T17:08:00+00:00";"0";"No requirement";"Installation included in software bundle"
"Adobe Bridge CC (Mac)";"No license required";"2018-03-12T13:45:00+00:00";"";"";"Installation included in software bundle"
"Adobe Extension Manager CC (Mac)";"No license required";"";"";"";"Installation included in software bundle"
"Adobe Illustrator CC 2015 (Mac)";"Installations";"2018-03-12T13:41:00+00:00";"0";"No requirement";"Installation included in software bundle"
[Updated to add]
My code currently:
#!/usr/local/bin/python3
import os
import csv
def csv_reader(machine_dir, machine):
mach_list = list(csv.reader(open(machine_dir + "/" + machine, encoding="ISO-8859-1"), delimiter=";"))
return mach_list
def main():
# Get the paths to the csv files
csvFile = input("drop the app list csv here: ")
machine_dir = input("drop the machines csv folder here: ")
# Import appList csv
app_list = list(csv.reader(open(csvFile, encoding = "ISO-8859-1")))
# Get list of machine csv
machines = os.listdir(machine_dir)
for machine in machines:
machine_list = csv_reader(machine_dir, machine)
new_machine = [app for app in app_list if app not in machine_list]
print(new_machine)
if __name__ == '__main__': main()
I'm currently testing it on one machine csv file and the return result is not what's left after subtracting app_list
from machine_list
Upvotes: 2
Views: 101
Reputation: 493
Alternatively, you could use pandas
(https://pandas.pydata.org/pandas-docs/stable/api.html) (assuming that there are no duplicate lines within each file that you want to keep).
import pandas
app = pandas.read_csv('applist.csv', encoding="ISO-8859-1")
machine = pandas.read_csv('machine.csv', encoding="ISO-8859-1")
# Combine both dataframes into one
dataframe = app.append(machine, ignore_index=True)
# Only keep the first of each set of duplicates
# This should give us the machine list (without any of the lines
# duplicated in the applist) plus the full applist
dataframe.drop_duplicates(keep='first', inplace=True)
# Now add the applist again
dataframe = dataframe.append(app, ingore_index=True)
# Now drop all the duplicates
# (since the applist was added again, this should drop the entire applist)
dataframe.drop_duplicates(keep=False, inplace=True)
dataframe.reset_index(inplace=True)
# Now 'dataframe' should be the machine list without any lines from applist
If these files are relatively small, then using a loop will be about the same time as using pandas, but if these files large pandas should be significantly faster.
Upvotes: 1
Reputation: 25370
You are using a conventional loop, then doing a list comprehension, which I don't think is what you need.
In your list comprehension you are looping through values in machine
and then appending values to the list if the values are not in machine
. So your logic is a bit off. You actually need to loop through the values of appList
in your list comprehension and see whether they appear in the list machine
:
import csv
appList = csv.reader(open('applist.csv', encoding = "ISO-8859-1"))
appList = list(appList)
machine = csv.reader(open('machine.csv', encoding = "ISO-8859-1"))
machine = list(machine)
new_machine = [app for app in appList if app not in machine]
Edit:
When opening your files, if you inspect them they are nested lists. One solution may be to flatten the lists, then use the same list comprehension:
import csv
appList = csv.reader(open('applist.csv'))
appList = list(appList)
machine = csv.reader(open('machine.csv'))
machine = list(machine)
# Flatten both appList and machine
flat_appList = [item for sublist in appList for item in sublist]
flat_machine = [item for sublist in machine for item in sublist]
new_machine = [app for app in flat_machine if app not in flat_appList]
Note: be careful - In the example csv files appList.csv contains e.g. Adobe Creative Cloud for Enterprise
which is not the same as what is included in your machine.csv Adobe Creative Cloud for Enterprise (Mac)
Upvotes: 3