Reputation: 343
I have numerous log files with target lines that I wish to 'grep', such as:
EGPA019_90pc.recode.2.log:Cross-Entropy (masked data): 0.556984
I wish to slice out the "2" and the "0.556984" spaced with a tab to a file
So, if I enter:
grep "Cross-Entropy (masked data):" *.log | cut -d '.' -f 3 >> targetFile.txt
I get the "2", and:
grep "Cross-Entropy (masked data):" *.log | cut -d ' ' -f 4 >> targetFile.txt
I get the "0.556984". But how can I write this in a single line of code to obtain the "2" then a tab then "0.556984" on the same line in my target file?
Many thanks
Clive
Upvotes: 0
Views: 827
Reputation: 20022
You can remove unwanted things with sed
:
grep "Cross-Entropy (masked data):" *.log | sed 's/.*recode.//;s/\..*: //'
The grep
and sed
can be combined:
sed -n '/Cross-Entropy (masked data):/ {s/.*recode.//;s/\..*: //;p}' *.log
Upvotes: 0
Reputation: 85780
You can use grep
and a bit of bash
built-in regEx feature.
grep -h "Cross-Entropy (masked data):" *.log | while IFS= read -r string; do
[[ "$string" =~ .recode.([[:digit:]]+).*:\ (.*)$ ]]
printf "%s\t%s\n" "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]//[[:blank:]]}";
done
My input files
$ cat *.log
EGPA019_90pc.recode.2.log:Cross-Entropy (masked data): 0.556984
EGPA019_90pc.recode.9.log:Cross-Entropy (masked data): 0.996984
EGPA019_90pc.recode.7.log:Cross-Entropy (masked data): 0.756984
$ grep -h "Cross-Entropy (masked data):" *.log | while IFS= read -r string; do
[[ "$string" =~ .recode.([[:digit:]]+).*:\ (.*)$ ]]
printf "%s\t%s\n" "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]//[[:blank:]]}"; done
2 0.556984
9 0.996984
7 0.756984
Explanation:-
bash
regEx feature to capture the required string, instead of using other native tools.grep
is piped to apply the regEx [ "$string" =~ recode.([[:digit:]]+).*:\ (.*)$ ]]
which captures your required entries, a digit and the decimal number.printf
to print those variables. The 2nd capture, i.e. the decimal digits have whitespace characters in the beginning, removing it by "${BASH_REMATCH[2]//[[:blank:]]}"
You can also wrap it around in a shell script as below:-
#!/bin/bash
while IFS= read -r string; do
[[ "$string" =~ .recode.([[:digit:]]+).*:\ (.*)$ ]]
printf "%s\t%s\n" "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]//[[:blank:]]}"
done < <(grep -h "Cross-Entropy (masked data):" *.log)
Or) Use grep
with PCRE flag -P
option, and xargs
to filter output.
grep -Pho '\.recode\.\K\d+|: \K.*' *.log | xargs -n2 -d'\n'
2 0.556984
9 0.996984
7 0.756984
(or) Use a much simpler perl
regEx syntax.
perl -lne 'print "$1 $2" if /\.recode\.(\d+).*:\s+(.*)/' *.log
2 0.556984
9 0.996984
7 0.756984
Upvotes: 2
Reputation: 46856
I think I'd do this using awk
rather than parsing the output of grep
.
I don't have your dataset to test this on, but it seems to me that the following should work.
awk '/^Cross-Entropy \(masked data\):/ {split(FILENAME,a,".");printf("%s\t%s\n", a[3], $NF}' *.log
It's a bit long as a one-liner. As a standalone script, it might look like this:
#!/usr/bin/awk -f
/^Cross-Entropy \(masked data\):/ {
split(FILENAME,a,".")
printf("%s\t%s\n", a[3], $NF
}
Save this in a file, make it executable, and you have yourself a brand new shell command.
Note that this works by using field splitting, NOT by using a regex.
Upvotes: 0