abhinav singh
abhinav singh

Reputation: 1104

Matching Popen Output through Regex in python

I am using Python Popen to execute a command and capture the result in a variable:

av_proc = Popen(
        [
            CLAMSCAN_PATH,
            "--stdout",
            "-d",
            AV_DEFINITION_PATH,
            blob_full_path
        ],
        stderr=STDOUT,
        stdout=PIPE,
        env=av_env
        )
    communicateArray = av_proc.communicate()
    claimav_output = communicateArray[0].decode('utf-8')        
    logging.info("clamAV output {}".format(claimav_output))
    logging.info("clamAV output Type {}".format(type(claimav_output)))

The log lines printed are:

[3/1/19 6:38:01 AM] clamAV output LibClamAV Warning: **************************************************
[3/1/19 6:38:01 AM] LibClamAV Warning: ***  The virus database is older than 7 days!  ***
[3/1/19 6:38:01 AM] LibClamAV Warning: ***   Please update it as soon as possible.    ***
[3/1/19 6:38:01 AM] LibClamAV Warning: **************************************************
[3/1/19 6:38:01 AM] /home/admin/Desktop/blob_folder/test2: Worm.Mydoom.I FOUND
[3/1/19 6:49:28 AM] clamAV output Type <class 'str'>

My goal is to extract the virus name in this output, which is 'Worm.Mydoom.I'.

The output will always be in this format :

Filepath: Detection_Name FOUND

I wrote the following python script to extract The detection name:

    matchObj = re.match(r'(?<=:\s)\S+(?=\s+FOUND)', claimav_output)
    logging.info("matchObj  Type {}".format(type(matchObj)))
    logging.info(matchObj.group())
    logging.info(matchObj)

Unfortunately this is not working. can someone point me to the error in my code. Here is the error that I see in the logs:

[3/1/19 6:38:01 AM] clamAV output Type <class 'str'>
[3/1/19 6:38:01 AM] CLAIMAV Parsing claimav output to extract any scan resullt 
[3/1/19 6:38:01 AM] matchObj  Type <class 'NoneType'>
[3/1/19 6:38:01 AM] 'NoneType' object has no attribute 'group'
[3/1/19 6:38:01 AM] Something went wrong

Upvotes: 0

Views: 77

Answers (1)

Pushpesh Kumar Rajwanshi
Pushpesh Kumar Rajwanshi

Reputation: 18357

You can use look arounds to capture the virus name using this regex,

(?<=:\s)\S+(?=\s+FOUND)

Demo

Check out this Python code,

import re

s = """[3/1/19 6:02:52 AM] clamAV output b'LibClamAV Warning: **************************************************\nLibClamAV Warning: * The virus database is older than 7 days! \nLibClamAV Warning: Please update it as soon as possible. *\nLibClamAV Warning: **************************************************\n/home/admin/Desktop/blob_folder/test2: Worm.Mydoom.I FOUND\n\n----------- SCAN SUMMARY -----------\nKnown viruses: 3832461\nEngine version: 0.100.2\nScanned directories: 0\nScanned files: 1\nInfected files: 1\nData scanned: 0.02 MB\nData read: 0.02 MB (ratio 1.00:1)\nTime: 6.519 sec (0 m 6 s)\n'

    [3/1/19 6:02:52 AM] clamAV output Type class 'bytes'"""

m = re.search(r'(?<=:\s)\S+(?=\s+FOUND)', s)
if(m):
 print(m.group())

Prints the virus name,

Worm.Mydoom.I

Upvotes: 1

Related Questions