chriszanf
chriszanf

Reputation: 35

Search CSV if list element is present then delete

I am new to Python and trying to import 2 csv files using csv.reader then comparing to see if elements from one are present in the other, and if so deleting that entire row.

I have found other questions to similar problems that suggest list comprehension is the way to go but when I do the loop to check if the appList exists in machine the result I get is empty brackets like thus [].

My code so far is:

import csv

appList = csv.reader(open('applist.csv', encoding = "ISO-8859-1"))
appList = list(appList)

machine = csv.reader(open('machine.csv', encoding = "ISO-8859-1"))
machine = list(machine)

for app in appList:
     machine = [app for app in machine if app not in machine]
     print(machine)

The applist.csv looks like this (its a list of apps on a macOS standard build)

Adobe Creative Cloud for Enterprise
Adobe Acrobat DC Professional
Adobe Bridge CC
Adobe Extension Manager CC
Adobe Illustrator CC 2015
Adobe InDesign CC 2015
Adobe Photoshop CC 2015
Adobe Media Encoder CC 2015
AirPort Utility 6
App Store
Automator 2
[...]

The machine.csv looks like this...

"Application name";"Metric";"Last used";"Requirement";"Entitlement state";"Remark"
"Adobe Creative Cloud for Enterprise (Mac)";"Installations";"2018-03-28T10:45:00+01:00";"1";"Not covered";""
"Adobe Acrobat DC Professional (Mac)";"Installations";"2018-03-22T17:08:00+00:00";"0";"No requirement";"Installation included in software bundle"
"Adobe Bridge CC (Mac)";"No license required";"2018-03-12T13:45:00+00:00";"";"";"Installation included in software bundle"
"Adobe Extension Manager CC (Mac)";"No license required";"";"";"";"Installation included in software bundle"
"Adobe Illustrator CC 2015 (Mac)";"Installations";"2018-03-12T13:41:00+00:00";"0";"No requirement";"Installation included in software bundle"

[Updated to add]

My code currently:

#!/usr/local/bin/python3

import os
import csv

def csv_reader(machine_dir, machine):
    mach_list = list(csv.reader(open(machine_dir + "/" + machine, encoding="ISO-8859-1"), delimiter=";"))
    return mach_list

def main():
    # Get the paths to the csv files
    csvFile = input("drop the app list csv here: ")
    machine_dir = input("drop the machines csv folder here: ")

    # Import appList csv
    app_list = list(csv.reader(open(csvFile, encoding = "ISO-8859-1")))

    # Get list of machine csv
    machines = os.listdir(machine_dir)

    for machine in machines:
        machine_list = csv_reader(machine_dir, machine)

        new_machine = [app for app in app_list if app not in machine_list]

        print(new_machine)



if __name__ == '__main__': main()

I'm currently testing it on one machine csv file and the return result is not what's left after subtracting app_list from machine_list

Upvotes: 2

Views: 101

Answers (2)

L. MacKenzie
L. MacKenzie

Reputation: 493

Alternatively, you could use pandas (https://pandas.pydata.org/pandas-docs/stable/api.html) (assuming that there are no duplicate lines within each file that you want to keep).

import pandas

app = pandas.read_csv('applist.csv', encoding="ISO-8859-1")
machine = pandas.read_csv('machine.csv', encoding="ISO-8859-1")

# Combine both dataframes into one
dataframe = app.append(machine, ignore_index=True)

# Only keep the first of each set of duplicates
# This should give us the machine list (without any of the lines
# duplicated in the applist) plus the full applist
dataframe.drop_duplicates(keep='first', inplace=True)
# Now add the applist again
dataframe = dataframe.append(app, ingore_index=True)
# Now drop all the duplicates
# (since the applist was added again, this should drop the entire applist)
dataframe.drop_duplicates(keep=False, inplace=True)
dataframe.reset_index(inplace=True)

# Now 'dataframe' should be the machine list without any lines from applist

If these files are relatively small, then using a loop will be about the same time as using pandas, but if these files large pandas should be significantly faster.

Upvotes: 1

DavidG
DavidG

Reputation: 25370

You are using a conventional loop, then doing a list comprehension, which I don't think is what you need.

In your list comprehension you are looping through values in machine and then appending values to the list if the values are not in machine. So your logic is a bit off. You actually need to loop through the values of appList in your list comprehension and see whether they appear in the list machine:

import csv

appList = csv.reader(open('applist.csv', encoding = "ISO-8859-1"))
appList = list(appList)

machine = csv.reader(open('machine.csv', encoding = "ISO-8859-1"))
machine = list(machine)

new_machine = [app for app in appList if app not in machine]

Edit:

When opening your files, if you inspect them they are nested lists. One solution may be to flatten the lists, then use the same list comprehension:

import csv

appList = csv.reader(open('applist.csv'))
appList = list(appList)

machine = csv.reader(open('machine.csv'))
machine = list(machine)

# Flatten both appList and machine
flat_appList = [item for sublist in appList for item in sublist]
flat_machine = [item for sublist in machine for item in sublist]

new_machine = [app for app in flat_machine if app not in flat_appList]

Note: be careful - In the example csv files appList.csv contains e.g. Adobe Creative Cloud for Enterprise which is not the same as what is included in your machine.csv Adobe Creative Cloud for Enterprise (Mac)

Upvotes: 3

Related Questions