Sri
Sri

Reputation: 613

Bash: How to search a string in file with Regex and get the associated value

I have a file with some patterns and a corresponding value for each pattern(Regex) in the following way:

path                  group
/hello/get/**         @group1
/hey/get/you          @group2
/hi/get/ping_*.js     @group3
/hello/get/**         @group4

I want to get the corresponding group value for the path I have given. For example if I give "/hello/get/book.js" I should get @group1.

How can I do that?

I have tried searching for the Regex, but I am not sure how to fetch the corresponding group from the file. Also, the grep returns the matching line if there is an exact match but not the Regex match. For example, when I give

grep '/hey/get/you' FILENAME 

I get the following output: /hey/get/you @group2

But, if I give the following:

grep '/hello/get/hello.js'

it doesn't return anything.

The expected result for the string '/hello/get/hello.js' should be @group1, @group4

Upvotes: 2

Views: 936

Answers (2)

pjh
pjh

Reputation: 8064

If I understand the question correctly, you want code that will read a list of pattern-group pairs from a file (say 'pattern_group_list.txt'), input a string (say from the command line), and print a string containing a comma-separated list of the groups corresponding to the patterns in the file that match it. If that is the case, try this code:

#! /bin/bash

readonly kPATTERN_GROUP_FILE=pattern_group_list.txt

input=$1

{
    read -r pattern group || exit 0    # Skip the first line (header)
    result=
    while read -r pattern group ; do
        [[ $input == $pattern ]] && result+=${result:+,}$group
    done
} <"$kPATTERN_GROUP_FILE"

printf '%s\n' "$result"
  • The code is not completely Shellcheck-clean because $pattern is not quoted in [[ $input == $pattern ]], but quoting it would break the code by preventing glob patterns from being matched.
  • It prints '@group2' when run with argument '/hey/get/you' and it prints '@group1,@group4' when run with argument '/hello/get/hello.js'.
  • The code will not work if patterns contain whitespace characters. You would need a different file format to support such patterns.
  • The last pattern-group pair in the file will be missed if the last line of the file is not terminated. See Read last line of file in bash script when reading file line by line for an explanation of the problem, and how to fix it if it is a concern for you.
  • If the file is empty the code exits immediately with good status. You would probably want to do something different in practical code.
  • The code makes no attempt to print useful error messages for a non-existent or unreadable input file. Practical code would handle such errors.
  • Bash is generally very slow. If you've got 5k+ patterns to match, don't expect to be able to process large numbers of files in reasonable amounts of time. I'd expect it to be bearable up to maybe 1k files. Beyond that you would really need to use a more efficient programming language.

Upvotes: 1

Gilles Qu&#233;not
Gilles Qu&#233;not

Reputation: 185053

It's not regex, but extended globs, to be enabled with

shopt -s globstar

An implementation to use this extented globs to find the file /tmp/test/hello/get/hello.js :

awk -F/ 'BEGIN{OFS="/"}NR>1{$(NF)=""; print}' /tmp/file |
    xargs -I% -n1 mkdir -p /tmp/test/%

tree

$ tree /tmp/test
/tmp/test
├── hello
│   └── get
├── hey
│   └── get
└── hi
    └── get

creating the file

touch /tmp/test/hello/get/hello.js

extented dynamic glob matching

$ awk 'NR>1{print $1, $2}' /tmp/file |
    while read r x; do
        stat /tmp/test$r &>/dev/null && echo $x
    done

output

@group1
@group4

doc

man 7 glob
globstar

Upvotes: 1

Related Questions