Reputation: 10784
im documenting a Shell Script of the server of my job which takes a series of files that starts with the word "dat" and performs a particular task with all those files. The problem is that the script is filtering files using a regular expression with sed command as follows:
namecmp=`grep -l $name dat*.p |sed -e "s/^\(......\)\(..\)\(..\)\(....\)\(.*\)/\1\4\3\2\5/g"| sort -t '.' -k 1.7,1.14 |sed -e "s/^\(......\)\(....\)\(..\)\(..\)\(.*\)/\1\4\3\2\5/g" | tail -1 `
I don't understand how exactly is doing this regular expression to filter out files. It would be helpful to know any expected output or examples files filtered by that expression.
Is there a way to find possible expressions that are accepted by that expression?
Upvotes: 1
Views: 1770
Reputation: 10039
echo "1111112233444456789" | sed -e "s/^\(......\)\(..\)\(..\)\(....\)\(.*\)/\1\4\3\2\5/g"
-> 1111114444332256789
explain:
Begin 111111 22 33 4444 56789
^ \(......\)\(..\)\(..\)\(....\)\(.*\)
\1 \2 \3 \4 \5
optimization:
\(.*\)
is not needed and thus the corresponding \5
must be removedg
is also not needed (there is only 1 substitution possible dur to ^
meaning start of string)Upvotes: 0
Reputation: 59416
grep -l
searches in a list of files (dat*.p
) for a regular expression ($name
in your case, or better: whatever $name
evaluates to) and then prints only the files' names in which this was found.
These file names are then passed through the sed
command which replaces (s
for substitute) something, namely ^\(......\)\(..\)\(..\)\(....\)\(.*\)
by \1\4\3\2\5
(so it just regroups parts of the file names). The transformed file names are then passed to sort
, and then to sed
again which just seems to undo the regrouping of the file name.
Finally, just the last file name is taken (tail -1
) and all the rest is thrown away. This can be achieved a lot cheaper than by sorting all the file names, but who cares ;-)
Effectively, this line finds the name of the "last" file matching the regexp in $name
. The meaning of "last" is determined by the sorting of the file names after regrouping; assuming from the size of the groups, I think a time stamp is modified so that it is changed from DDMMYYYY
to YYYYMMDD
which makes sense in a way.
Upvotes: 3
Reputation: 30273
There are libraries designed to do that (e.g. Xeger) but for this I can just provide you with an example:
abcdef02122014foobarfoobarfoobar
^ ^ ^ ^ ^
| | | | |
1 2 3 4 5
becomes
abcdef20140212foobarfoobarfoobar
^ ^ ^ ^ ^
| | | | |
1 4 3 2 5
and then I don't know what the sort
does but the next sed
simply puts all of the above back in order.
So it seems the regular expressions are used to temporarily change the format of lines for sorting, before restoring the original format.
Upvotes: 1