Sopalajo de Arrierez
Sopalajo de Arrierez

Reputation: 3860

AWK: What is the regular expression to match any string?

I assume the regular expression would be the same in any case (eg: grep), but I am testing this using awk.

Case example:

$ cat Test.csv
2018-03-31,22:39,Test,2,4,2
2018-03-31,22:40,Test02,2,4,2
2018-03-31,22:40,Test03,2,4,2
2018-03-31,22:40,Test04,2,4,2
2018-03-31,22:59,Test03,5,4,2
2018-03-31,23:00,Test07,6,4,2
2018-03-31,23:00,Test08,2,2,2

I would like to know how to match any value (any string) at, say, field 3:

$ awk -F ',' '$3 == *' Test.csv
awk: syntax error at source line 1
 context is
        $3 == >>>  * <<<
awk: bailing out at source line 1

-

$ awk -F ',' '$3 == .*' Test.csv
awk: syntax error at source line 1
 context is
        $3 == >>>  . <<< *
awk: bailing out at source line 1

-

$ awk -F ',' '$3 == /*/' Test.csv
awk: illegal primary in regular expression * at
 source line number 1
 context is
        $3 == >>>  /*/ <<<

-

$ awk -F ',' '$3 == /.*/' Test.csv
[No results]

-

$ awk -F ',' '$3 == /^*/' Test.csv
[No results]

Even when there could be other methods to solve the problem, how can I match every possible string by using regular expressions at a specific field (CSV assumed) by using AWK?

If possible, it could be useful some method to include the NUL string (for empty fields like 2018-03-31,23:00,,2,2,2) so AWK returns every line.


Why do I need this?

(upon requestion; a bit awkward to explain, sorry; only for those interested)

Basically for code simplification (readability). My shell script is structured in a manner that I would prefer to execute the search like this:

awk -F ',' -v AL__AWK="$AL" -v VL__AWK="$VL" -v DL__AWK="$DL" -v Codigo__AWK="$Codigo" -v SubCodigo__AWK="$SubCodigo" '$4 == AL__AWK && $5 == VL__AWK && $6 == DL__AWK && $8 ~ Codigo__AWK && $9 ~ SubCodigo__AWK' "$LogFile"

As can be seen (er... or so I hope) the search for fields inside the CSV file is variable-based. Some of these variables are initialized with a specific value and some come from parameter input. So, having a generic "any string" value would prevent me from writing this AWK line in multiple different manners.

I.E: Sometimes the script will receive the Codigo=Ptt value via input parameter , and some the Codigo variable will not be defined by parameter inputing; in this second case, I will do Codigo=".*", so the above AWK line will still be valid.

Sorry, but the complete explanation is very long.

Upvotes: 1

Views: 4152

Answers (2)

Sopalajo de Arrierez
Sopalajo de Arrierez

Reputation: 3860

As easy as (note the ~ instead of the == operator):

$ awk -F ',' '$3 ~ /.*/' Test.csv
2018-03-31,22:39,Test,2,4,2,,,
2018-03-31,22:40,Test02,2,4,2,,,
2018-03-31,22:40,Test03,2,4,2,,,
2018-03-31,22:40,Test04,2,4,2,,,
2018-03-31,22:59,Test03,5,4,2,,,
2018-03-31,23:00,Test07,6,4,2,,,
2018-03-31,23:00,Test08,2,2,2,,,

Without explaining the reasons to do this kind of search (original poster told it was harsh to describe), it seems just some sort of academic question, but I guess someone could probably need it someday.

For the case (as explained on the question) of using variables as a pattern:

$ foo="03"
$ awk -F ',' -v foo__AWK="$foo" '$3 ~ foo__AWK' Test.csv
2018-03-31,22:40,Test03,2,4,2,,,
2018-03-31,22:59,Test03,5,4,2,,,

$ foo=".*"
$ awk -F ',' -v foo__AWK="$foo" '$3 ~ foo__AWK' Test.csv
2018-03-31,22:39,Test,2,4,2,,,
2018-03-31,22:40,Test02,2,4,2,,,
2018-03-31,22:40,Test03,2,4,2,,,
2018-03-31,22:40,Test04,2,4,2,,,
2018-03-31,22:59,Test03,5,4,2,,,
2018-03-31,23:00,Test07,6,4,2,,,
2018-03-31,23:00,Test08,2,2,2,,,

$ foo=""
$ awk -F ',' -v foo__AWK="$foo" '$3 ~ foo__AWK' Test.csv
2018-03-31,22:39,Test,2,4,2,,,
2018-03-31,22:40,Test02,2,4,2,,,
2018-03-31,22:40,Test03,2,4,2,,,
2018-03-31,22:40,Test04,2,4,2,,,
2018-03-31,22:59,Test03,5,4,2,,,
2018-03-31,23:00,Test07,6,4,2,,,
2018-03-31,23:00,Test08,2,2,2,,,

So, .* works like `` as "match any pattern" regexp.

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203985

@SopalajodeArrierez now I see your rationale I understand what it is you're trying to do and you don't need to come up with a regexp that matches any string, your unset shell variable already IS such a regexp. Look:

$ echo 'a' | awk -v x='.*' '$1 ~ x'
a

$ echo 'a' | awk -v x='' '$1 ~ x'
a

So if your shell variable is unset, just leave it unset as any string will partially match the null string on a regexp comparison.

Upvotes: 1

Related Questions