Reputation: 803
I search for log files having errors using egrep
and it outputs a bunch of files. What I want to do is manipulate those strings and present in a different way.
/abcd/efgh/ijkl/logs/fac_unet_abp99507.log.20220708111219.26476752.0
/abcd/efgh/ijkl/logs/fac_oxf_abp3506.log.20220708111219.26476752.0
/abcd/efgh/ijkl/logs/cirrus_abp4296EI_20220824.log
/abcd/efgh/ijkl/mcr/logs/prof_cmcr_abp4296MR.log.20220824150526.15728964.0
The output should look like:
ABP99507,UNET
ABP3506,OXF
ABP4296EI,CIRRUS
ABP4296MR,CMCR
I tried awk
and sed
and couldn't figure out a way to do this. I want to be able to make it dynamic and do it via regular expressions.
What I have tried so far is:
egrep -li "^error" /abcd/efgh/ijkl/logs/*202207* | awk '/unet|cirrus|oxf|csp|cmcd|cmcr|nice/ {print}'
egrep -li "^error" /abcd/efgh/ijkl/logs/*202207* | sed -n "s/.*\(cirrus|unet|cmcr|csp|cmcd|oxf|nice\)\(abp[0-9]*[A-ZA-Za-za-z]*\).*/\1,\2/p"
Sed doesn't work as the "|"
operator is taken as literal; I am not using GNU version. Even escaping it doesn't work. Also I can't seem to make use of capture groups.
Upvotes: -1
Views: 258
Reputation: 204259
Throw away egprep
(which is deprecated in favor of grep -E
by the way), and just use awk, e.g. using an awk that supports nextfile
such as GNU awk (also already supported in some other awks and soon will be required per POSIX):
awk -v OFS=',' '
tolower($0) ~ /^error/ {
split(toupper(FILENAME),a,/[_.]/)
print a[3], a[2]
nextfile
}
' /abcd/efgh/ijkl/logs/*202207*
or using any awk:
awk -v OFS=',' '
FNR==1 { searching=1 }
searching && (tolower($0) ~ /^error/) {
split(toupper(FILENAME),a,/[_.]/)
print a[3], a[2]
searching=0
}
}' /abcd/efgh/ijkl/logs/*202207*
If you really want to implement what you were apparently trying to do with /unet|cirrus|oxf|csp|cmcd|cmcr|nice/
to restrict which files the script examines then change this:
awk -v OFS=',' '
...
}' /abcd/efgh/ijkl/logs/*202207*
to this:
shopt -s extglob
awk -v OFS=',' '
...
}' /abcd/efgh/ijkl/logs/*@(unet|cirrus|oxf|csp|cmcd|cmcr|nice)*202207*
Upvotes: 1
Reputation: 36680
Also I can't seem to make use of capture groups.
You did not escape |
so they are meaning literal |
, you need to escape it to mean alternative, as is case with (
and )
(literal vs group delimiter). After doing that and repairing minor issues I get it working: let file.txt
content be
/abcd/efgh/ijkl/logs/fac_unet_abp99507.log.20220708111219.26476752.0
/abcd/efgh/ijkl/logs/fac_oxf_abp3506.log.20220708111219.26476752.0
then
sed -e 's/.*\(cirrus\|unet\|cmcr\|csp\|cmcd\|oxf\|nice\)_\(abp[0-9]*[A-ZA-Za-za-z]*\).*/\2,\1/' -e 's/[a-z]/\U&/g' file.txt
gives output
ABP99507,UNET
ABP3506,OXF
Explanation: I introduced following changes: escaped |
, added _
between groups, change order of replacement (2nd group is first), dropped /p
as it caused doubling output. After doing this I added second action: uppercasing using standard GNU sed
way of doing so. As there are now 2 actions, I use -e
to register them.
(tested in GNU sed 4.2.2)
Upvotes: 2
Reputation: 133680
1st solution: Simplest option would be, using awk
's field separator option. With your shown samples please try following awk
code.
awk -F'/|\\.|_' '{print toupper($8","$7)}' Input_file
2nd solution: In case you want to try with regular expression in awk
then try. Written and tested in GNU awk
.
awk 'match($0,/logs\/[^_]*_([^_]*)_([^.]*)\.log/,arr){print toupper(arr[2]","arr[1])}' Input_file
3rd solution: With GNU sed
's enabling ERE with -E
option please try following code.
sed -E 's/.*logs\/[^_]*_([^_]*)_([^.]*)\.log\..*/\U\2,\U\1/' Input_file
4th solution: Adding a NON-GNU awk
solution using match
function.
awk '
match($0,/logs\/[^_]*_([^_]*)_([^.]*)\.log/){
val=substr($0,RSTART+5,RLENGTH-5)
sub(/\.log/,"",val)
split(val,arr,"_")
print toupper(arr[3]","arr[2])
}
' Input_file
Upvotes: 2