Reputation: 53

Extract all matching substrings in bash

Looking for a solution in bash (will be part of a larger script).

Given a variable containing information of the form

diff -r efb93662e8a7 -r 53784895c0f7 diff.txt
--- diff.txt Fri Jan 23 14:48:30 2009 +0000
+++ b/diff.txt Fri Jan 23 14:49:58 2009 +0000
@@ -1,9 +0,0 @@ 
-diff -r 9741ec300459 myfile.c 
---- myfile.c Thu Aug 21 18:22:17 2008 +0000 
-+++ b/myfile.c Thu Aug 21 18:22:17 2008 +0000 -@@ -1,4 +1,4 @@ 
-  int myfunc() 
-  { 
--     return 1; 
-+     return 10; 
-  }

I wish to extract both (here diff.txt and myfile.c, but future cases will not be limited to this number) filenames to a string of the form "edited: filename1 filename2 ... filenameN".

To clarify, I wish to extract multiple matching filenames to a string.

The command "$(expr "$editing" : '.*---[[:space:]]$[[:graph:]]*$[[:space:]]')" returns the last filename correctly but not previous instances.

EDIT: Require the ability to identify edited filenames (possibly including spaces) i.e. filenames appearing after "---" and before the day "Fri/Thu...".

Thanks for your help (and to the many people have replied thus far).

Upvotes: 5

Answers (4)

Colas Nahaboo

Reputation: 769

A solution using only bash built-ins, no external programs is:

res="edited: "; var="${var#* --- } --- "
while test -n "$var";do res="$res ${var%% *}"; var="${var#* --- }";done
echo "$res"

It iterates on all occurences of " --- ". The trick is to prepare the string by first trimming garbarge from the start (up to first ---) and appending a " --- " at the end to be able to have a simpler logic in the while loop afterwards.

This is by using bash most useful feature, the # and % to trim strings

Upvotes: 4

user50264

Reputation: 545

Here is a simple, working solution:

txt=$(cat)
str="edited: "

for word in $txt; do
        if echo $word | grep -qi '^[a-z0-9-_]*\.[a-z]*$'; then
           str="$str $word"
        fi
done

echo $str

Running it:

anton@CAPTAIN-FALCON ~/Desktop
$ bash sol.sh
diff -r efb93662e8a7 -r 53784895c0f7 diff.txt --- diff.txt Fri Jan 23 14:48:30 2
009 +0000 +++ b/diff.txt Fri Jan 23 14:49:58 2009 +0000 @@ -1,9 +0,0 @@ -diff -r
 9741ec300459 myfile.c ---- myfile.c Thu Aug 21 18:22:17 2008 +0000 -+++ b/myfil
e.c Thu Aug 21 18:22:17 2008 +0000 -@@ -1,4 +1,4 @@ - int myfunc() - { -- return
 1; -+ return 10; - }
edited: diff.txt diff.txt myfile.c myfile.c

Edit: Dicking around with grep for a while resulted in the following script, but I'm starting to wonder if pure bash is the right tool for the job... It seems like there would be many corner cases where you would either miss some files or get erroneous file names.

#! /bin/bash

rawFiles=`cat | grep -ioz ' -* [a-z0-9-_\ ]*\.[a-z]*'`

for file in $rawFiles; do
   if ! echo $file | grep -q '^-*$'; then
      files="$files${file} "
   fi
done

echo "edited: $files"

Upvotes: 1

David Z

Reputation: 131600

I'd suggest using an external tool for it - here's one way with perl:

$(echo "$variable" | perl -e 'print "edited:"; while (<>) { while (/--- (\S+)/g) { print " $1"; } }')

I'm sure it can be done more elegantly, but I can't think of a way right now that wouldn't take a more substantial program.

Upvotes: 3

Douglas Leeder

Reputation: 53310

Could you perform your operation before setting $editing - then you might still have the line breaks?

Then maybe some sed would be able to extract the filenames.

Upvotes: 0

Extract all matching substrings in bash

Answers (4)

Related Questions