mrg
mrg

Reputation: 53

Finding strings in a large text file in Python

The following is my code:

with open("WinUpdates.txt") as f:
    data=[]
    for elem in f:
        data.append(elem)

with open("checked.txt", "w") as f:
    check=True
    for item in data:
        if "KB2982791" in item:
            f.write("KB2982791\n")
            check=False
        if "KB2970228" in item:
            f.write("KB2970228\n")
            check=False
        if "KB2918614" in item:
            f.write("KB2918614\n")
            check=False
        if "KB2993651" in item:
            f.write("KB2993651\n")
            check=False
        if "KB2975719" in item:
            f.write("KB2975719\n")
            check=False
        if "KB2975331" in item:
            f.write("KB2975331\n")
            check=False
        if "KB2506212" in item:
            f.write("KB2506212\n")
            check=False
        if "KB3004394" in item:
            f.write("KB3004394\n")
            check=False
        if "KB3114409" in item:
            f.write("KB3114409\n")
            check=False
        if "KB3114570" in item:
            f.write("KB3114570\n")
            check=False

    if check:
        f.write("No faulty Windows Updates found!")

The "WinUpdates.txt" file contains a lot of lines like these:

http://support.microsoft.com/?kbid=2980245 RECHTS Update
KB2980245 NT-AUTORITÄT\SYSTEM 8/18/2014
http://support.microsoft.com/?kbid=2981580 RECHTS Update
KB2981580 NT-AUTORITÄT\SYSTEM 8/18/2014
http://support.microsoft.com/?kbid=2982378 RECHTS Security Update KB2982378 NT-AUTORITÄT\SYSTEM 9/12/2014
http://support.microsoft.com/?kbid=2984972 RECHTS Security Update KB2984972 NT-AUTORITÄT\SYSTEM 10/17/2014
http://support.microsoft.com/?kbid=2984976 RECHTS Security Update KB2984976 NT-AUTORITÄT\SYSTEM 10/17/2014
http://support.microsoft.com/?kbid=2984981 RECHTS Security Update KB2984981 NT-AUTORITÄT\SYSTEM 10/16/2014
http://support.microsoft.com/?kbid=2985461 RECHTS Update
KB2985461 NT-AUTORITÄT\SYSTEM 9/12/2014
http://support.microsoft.com/?kbid=2987107 RECHTS Security Update KB2987107 NT-AUTORITÄT\SYSTEM 10/17/2014
http://support.microsoft.com/?kbid=2990214 RECHTS Update
KB2990214 NT-AUTORITÄT\SYSTEM 4/16/2015
http://support.microsoft.com/?kbid=2991963 RECHTS Security Update KB2991963 NT-AUTORITÄT\SYSTEM 11/14/2014
http://support.microsoft.com/?kbid=2992611 RECHTS Security Update KB2992611 NT-AUTORITÄT\SYSTEM 11/14/2014
http://support.microsoft.com/?kbid=2993651 RECHTS Update
KB2993651 NT-AUTORITÄT\SYSTEM 8/29/2014
http://support.microsoft.com/?kbid=2993958 RECHTS Security Update KB2993958 NT-AUTORITÄT\SYSTEM 11/14/2014

But when I execute my code, it says that it has not found any of those updates? Even though I know that it should find 4. I wrote the "data" list into a new text file, but there it seems everything alright?

Why do you think my code does not work?

Upvotes: 1

Views: 236

Answers (2)

PM 2Ring
PM 2Ring

Reputation: 55479

FWIW, your code can be written in a more compact way that doesn't require a zillion if statements. Also, since the (new) data file is only 63342 bytes you can read the whole thing into a single string, rather than into a list of strings.

kb_ids = (
    "KB2982791",
    "KB2970228",
    "KB2918614",
    "KB2993651",
    "KB2975719",
    "KB2975331",
    "KB2506212",
    "KB3004394",
    "KB3114409",
    "KB3114570",
)

with open("WinUpdates.txt") as f:
    data = f.read()

check = True
with open("checked.txt", "w") as f:
    for kb in kb_ids:
        if kb in data:
            f.write(kb + "\n")
            check = False

    if check:
        fout.write("No faulty Windows Updates found!\n")

Contents of checked.txt, using the linked data:

KB2970228
KB2918614
KB2993651
KB2506212
KB3004394

Note that this code prints the found kbids in the order that they're defined in kb_ids, rather than the order they occur in "WinUpdates.txt".

Searching through the whole file as a string for each kbid is probably not a good idea if the file is large, eg, more than a megabyte or so; you might want to run some timing tests (using timeit) to see which strategy works best on your data.

If you want to read a file into a list there's no need to use a for loop, you can just do this:

with open("WinUpdates.txt") as f:
    data = f.readlines()

Alternatively, you can process the file line by line without reading it into a list:

kb_ids = (
    "KB2982791",
    "KB2970228",
    "KB2918614",
    "KB2993651",
    "KB2975719",
    "KB2975331",
    "KB2506212",
    "KB3004394",
    "KB3114409",
    "KB3114570",
)

check = True
with open("WinUpdates.txt") as fin:
    with open("checked.txt", "w") as fout:
        for data in fin:
            for kb in kb_ids:
                if kb in data:
                    fout.write(kb + "\n")
                    check = False

        if check:
            fout.write("No faulty Windows Updates found!\n")

On more modern versions of Python the two with statements can be combined into a single line.

Upvotes: 2

Mark Skelton
Mark Skelton

Reputation: 3891

I added and fixed what you were missing check the two comments to see what I mean. This worked for me so it should work for you. Have a great day!

with open("WinUpdates.txt", "r") as f:  #you forgot to put the "r" option to read the file
    data = f.read()  #no reason to put the data into a list a string will do fine

with open("checked.txt", "w") as f:
    check=True
    if "KB2982791" in data:
        f.write("KB2982791\n")
        check=False
    if "KB2970228" in data:
        f.write("KB2970228\n")
        check=False
    if "KB2918614" in data:
        f.write("KB2918614\n")
        check=False
    if "KB2993651" in data:
        f.write("KB2993651\n")
        check=False
    if "KB2975719" in data:
        f.write("KB2975719\n")
        check=False
    if "KB2975331" in data:
        f.write("KB2975331\n")
        check=False
    if "KB2506212" in data:
        f.write("KB2506212\n")
        check=False
    if "KB3004394" in data:
        f.write("KB3004394\n")
        check=False
    if "KB3114409" in data:
        f.write("KB3114409\n")
        check=False
    if "KB3114570" in data:
        f.write("KB3114570\n")
        check=False

    if check:
        f.write("No faulty Windows Updates found!")

Upvotes: 1

Related Questions