durantejohn
durantejohn

Reputation: 21

Python returns different results on identical text files

I'm new to programming. I'm using powershell to filter and return records from a remote server's Windows security event log in a text file. I'm using a python script to count how many times users names appear in the text. When running against the original text file, python prints and empty dictionary {}. But, if I copy the contents of the text file and paste it in to a new text file and run my python script against it, it returns the correct count: {'name1': 2, 'name2': 13, 'name3': 1, 'name4': 1, 'name5': 2, 'name6': 2}. The text files look identical and the character positions are identical. What could be the problem?

Powershell

Get-WinEvent -LogName "Security" -ComputerName server01 | Where-Object {$_.ID -eq 4663} | where Message -CNotLike "*name1*" | where Message -CNotLike "*name2*" | Format-List -Property * | Out-File "C:\apowershell\winsec\events.txt"

Python

fhand = open('events2.txt')
counts = dict()
for line in fhand:
    if line.startswith('            Account Name:'):
        words = line.split()
        words.remove('Account')
        words.remove('Name:')
        for word in words:
            if word not in counts:
               counts[word] = 1
            else:
               counts[word] += 1
print(counts)

Log record Message : An attempt was made to access an object.

      Subject:
        Security ID:        S-1-5-21-495698755-754321212-623647154-4521
        Account Name:       name1
        Account Domain:     companydomain
        Logon ID:       0x8CB9C5024

      Object:
        Object Server:      Security
        Object Type:        File
        Object Name:        e:\share\file.txt
        Handle ID:      0x439c
        Resource Attributes:    S:PAI

      Process Information:
        Process ID:     0x2de8
        Process Name:       C:\Windows\System32\memshell.exe

      Access Request Information:
        Accesses:       Execute/Traverse

        Access Mask:        0x20

Upvotes: 2

Views: 359

Answers (1)

Kurtis Rader
Kurtis Rader

Reputation: 7459

The answer is in your problem statement. You're reading a file created on MS Windows with a python program running on a (presumably) non-Windows system.

The problem is the character encoding of the original file doesn't match what your python program expects. Specifically, the original file is in UCS-2 (or UTF-16) encoding. If you're running your python code on a UNIX like OS it's probably expecting UTF-8. But that depends on your locale, look at the output of locale. Google "python utf-16 decode" for ideas about how to deal with this. Although, personally, rather than trying to get your python program to handle UTF-16 I try to find a way to get the content converted to UTF-8 on the Windows system.

Upvotes: 1

Related Questions