user1677894
user1677894

Reputation: 43

Need to parse a log file in bash

I have a log file that contains a lot of text, some of it is useless. In this log there are some lines that are important for me. The pattern for those lines are:

 0x00000001 (NEEDED)                     Shared library: [libm.so.6]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x00000001 (NEEDED)                     Shared library: [ld.so.1]
 0x00000001 (NEEDED)                     Shared library: [libgcc_s.so.1]

The NEEDED keyword could be found on all lines that are important for me. The keyword between [] is the one important for me. I need to create a list of all those strings, without repeating them.

I've done this on Python, but looks like on the machine I want to run the script there is no Python available, so I need to rework the script in bash. I know only basic stuff in bash and I'm not able to find a solution for my problem.

The Python script I've used is:

import sys
import re


def testForKeyword(keyword, line):
    findStuff = re.compile(r"\b%s\b" % keyword, \
                                   flags=re.IGNORECASE)

    if findStuff.search(line):
        return True
    else:
        return False

# Get filename argument
if len(sys.argv) != 2:
    print("USAGE: python libraryParser.py <log_file.log>")
    sys.exit(-1)

file = open(sys.argv[1], "r")

sharedLibraries = []
for line in file:
    if testForKeyword("NEEDED", line):
        libraryNameStart = line.find("[") + 1
        libraryNameFinish = line.find("]")

        libraryName = line[libraryNameStart:libraryNameFinish]

        # No duplicates, only add if it does not exist
        try:
            sharedLibraries.index(libraryName)
        except ValueError:
            sharedLibraries.append(libraryName)

for library in sharedLibraries:
    print(library)

Can you please help me solving this issue? Thanks in advance.

Upvotes: 4

Views: 955

Answers (7)

Zsolt Botykai
Zsolt Botykai

Reputation: 51673

sed solution might be:

sed -e '/(needed)/!d' -e 's/\(.*\[\)\|\(\]$\)//g' INPUTFILE

Note, if you are on Windows, de proper way is this:

sed -e '/(needed)/!d' -e 's/\(.*\[\)\|\(\].$\)//g' INPUTFILE
  1. the first -e part deletes every line that does not match (needed)
  2. the second deletes everything till the last [ and the last ] (and on windows the \r (carriage return) before the \n but that's not a problem since the output printed properly...

Upvotes: 1

MK.
MK.

Reputation: 34587

 awk '/NEEDED/ {gsub("[][]", ""); print $5}' < /tmp/1.txt  | sort -u

Upvotes: 1

arutaku
arutaku

Reputation: 6107

If you have your logs in a file called "log.txt", you can get it:

grep "(NEEDED)" log.txt | awk -F"\[" '{print substr($2,0,length($2));}' - | sort -u

Using sort -u you will not get duplicated lines.

Upvotes: 1

Fredrik Pihl
Fredrik Pihl

Reputation: 45670

$ awk -F'[][]' '/NEEDED/ {print $2}' data.txt | sort | uniq
ld.so.1
libc.so.6
libgcc_s.so.1
libm.so.6

awk only:

$ awk -F'[][]' '/NEEDED/ {save[$5]++}END{ for (i in save) print i}' data.txt
libc.so.6
libm.so.6
libgcc_s.so.1
ld.so.1

Simplification of your python code:

#!/usr/bin/env python

libs = []

with open("data.txt") as fd:
    for line in fd:
        if "NEEDED" in line:
            libs.append(line.split()[4])

for i in set(libs):
    print i

Bash solution (without unique libs)

#!/bin/bash

while IFS='][' read -a array
do
    echo ${array[1]}
done < data.txt

Upvotes: 6

Mandar Pande
Mandar Pande

Reputation: 12994

awk -F '[' ' /NEEDED/ { print $NF } ' file_name | sed 's/]//' | sort | uniq

Upvotes: 3

Thor
Thor

Reputation: 47219

With grep and coreutils:

grep NEEDED infile | grep -o '\[[^]]*\]' | tr -d '][' | sort | uniq

Output:

ld.so.1
libc.so.6
libgcc_s.so.1
libm.so.6

Upvotes: 3

Birei
Birei

Reputation: 36282

One way using awk assuming infile with data of the question:

awk '
    $2 ~ /NEEDED/ { 
        lib = substr( $NF, 2, length($NF) - 2 ); 
        libs[ lib ] = 1;
    } 
    END { 
        for (lib in libs) { 
            printf "%s\n", lib;
        } 
    }
' infile

Output:

libc.so.6                                                                                                                                                                                                                                    
libgcc_s.so.1                                                                                                                                                                                                                                
ld.so.1                                                                                                                                                                                                                         
libm.so.6

Upvotes: 3

Related Questions