4673_j
4673_j

Reputation: 497

Checking for a multi-line regex match with grep

I am having an issue matching a block, with header details, which later are going to be processed (the files). This is for all .java files.

The problem, I assume so far, is that for some reason it doesn't seem to go to the next line, to match the next line (I might be wrong of course). It does match the block on regex101.com but when I run the script it doesn't seem to go to the next line.

I'm using Cygwin under Win7.

Only the enabled pattern seems to match so far, but this matches Example3 which I do NOT want; I only want to match Example1 and Example2.

This is my script so far:

# !/bin/bash
# Script START - Info
printf "Search for header with X details - START\n"

# Get the total files
FILES_TOTAL=$(ls -l | find . | grep "\.java$" | wc -l)
printf "Files to process: $FILES_TOTAL\n"

# Total nr. of various files
COUNTER_N=0
COUNTER_Y=0

# Set the files to be manipulated (all .java files)
SEARCH=$(find . | grep "\/uk\/" | grep "\.java$")

# Set the pattern for the header to search for
PATTERN='(.*DIGITAL.*)'
# PATTERN="(.*DIGITAL.*)"

############ THE PATTERN IS INCOMPLETE, FOR SOME REASON THE OTHER PATTERNS DO NOT WORK,
############ IT DOESN'T SEEM TO WORK THE NEW LINE/FEED
# PATTERN='(\/\*\*\r\n)(.*DIGITAL)'
# PATTERN="(\/\*\*\r\n)(.*DIGITAL)"
# PATTERN='(.*DIGITAL.*\n)(.*MILAN.*\n)(.*STOCK.*\n)(.*TEL.*\n)'
# PATTERN="(.*DIGITAL.*\n)(.*MILAN.*\n)(.*STOCK.*\n)(.*TEL.*\n)"
# PATTERN='(\/\*\*\n)(.*DIGITAL.*\n)(.*MILAN.*\n)(.*STOCK.*\n)(.*TEL.*\n)((.*\*\n?(\/?)){0,})'
# PATTERN="(\/\*\*\n)(.*DIGITAL.*\n)(.*MILAN.*\n)(.*STOCK.*\n)(.*TEL.*\n)((.*\*\n?(\/?)){0,})"
# PATTERN='(\/\*\*\n)(.*DIGITAL.*\n)(.*MILAN.*\n)(.*STOCK.*\n)(.*TEL.*\n)((.*\*\n?(\/?)){0,})/g'
# PATTERN="(\/\*\*\n)(.*DIGITAL.*\n)(.*MILAN.*\n)(.*STOCK.*\n)(.*TEL.*\n)((.*\*\n?(\/?)){0,})/g"

# For each .java file found
for file in "$SEARCH"; do       
    # Process files
    if egrep -q "$PATTERN" "$file"; then
        printf "Has the header: $file \n"
        let COUNTER_Y=COUNTER_Y+1
    else
        # printf "Does NOT have the header: $file "\n"
        let COUNTER_N=COUNTER_N+1
    fi

    # Update nr. of files
    let FILES_PROCESSED=COUNTER_Y+COUNTER_N
done

# Script END - Info/Report
printf "Search for header with X details - END\n"
printf "Files - NO header: $COUNTER_N"
printf "Files - YES header: $COUNTER_Y"
printf "Total files processed: $FILES_PROCESSED"

It matches perfectly for what I want (Example1 & Example2) on the web, but it does not work in the script! There is a sample file and the regex matching the block here: https://regex101.com/r/kG5iK7/2

What's going on here?! Any help is much appreciated.

Upvotes: 0

Views: 1553

Answers (1)

4673_j
4673_j

Reputation: 497

The main issue was matching the multi line pattern. Therefore this did the trick:

if grep -Pz "$PATTERN" "$file"; then

-P activate perl-regexp for grep

-z suppress newline at the end of line

Thanks to @Charles Duffy for reminding the good practice code.

Upvotes: 1

Related Questions