Reputation: 343

slice string with multiple delimiters

I have numerous log files with target lines that I wish to 'grep', such as:

EGPA019_90pc.recode.2.log:Cross-Entropy (masked data):   0.556984

I wish to slice out the "2" and the "0.556984" spaced with a tab to a file

So, if I enter:

grep "Cross-Entropy (masked data):" *.log | cut -d '.' -f 3 >> targetFile.txt

I get the "2", and:

grep "Cross-Entropy (masked data):" *.log | cut -d ' ' -f 4 >> targetFile.txt

I get the "0.556984". But how can I write this in a single line of code to obtain the "2" then a tab then "0.556984" on the same line in my target file?

Many thanks

Clive

Upvotes: 0

Answers (3)

Walter A

Reputation: 20022

You can remove unwanted things with sed:

grep "Cross-Entropy (masked data):" *.log | sed 's/.*recode.//;s/\..*: //'

The grep and sed can be combined:

sed -n '/Cross-Entropy (masked data):/ {s/.*recode.//;s/\..*: //;p}' *.log

Upvotes: 0

Inian

Reputation: 85780

You can use grep and a bit of bash built-in regEx feature.

grep -h "Cross-Entropy (masked data):" *.log | while IFS= read -r string; do
       [[ "$string" =~ .recode.([[:digit:]]+).*:\ (.*)$ ]] 
       printf "%s\t%s\n" "${BASH_REMATCH[1]}"  "${BASH_REMATCH[2]//[[:blank:]]}";
done

My input files

$ cat *.log
EGPA019_90pc.recode.2.log:Cross-Entropy (masked data):   0.556984
EGPA019_90pc.recode.9.log:Cross-Entropy (masked data):   0.996984
EGPA019_90pc.recode.7.log:Cross-Entropy (masked data):   0.756984

$ grep -h "Cross-Entropy (masked data):" *.log | while IFS= read -r string; do
       [[ "$string" =~ .recode.([[:digit:]]+).*:\ (.*)$ ]] 
       printf "%s\t%s\n" "${BASH_REMATCH[1]}"  "${BASH_REMATCH[2]//[[:blank:]]}"; done
2       0.556984
9       0.996984
7       0.756984

Explanation:-

Am using built-in bash regEx feature to capture the required string, instead of using other native tools.
The output of grep is piped to apply the regEx [ "$string" =~ recode.([[:digit:]]+).*:\ (.*)$ ]] which captures your required entries, a digit and the decimal number.
Using printf to print those variables. The 2nd capture, i.e. the decimal digits have whitespace characters in the beginning, removing it by "${BASH_REMATCH[2]//[[:blank:]]}"

You can also wrap it around in a shell script as below:-

#!/bin/bash

while IFS= read -r string; do
    [[ "$string" =~ .recode.([[:digit:]]+).*:\ (.*)$ ]]
    printf "%s\t%s\n" "${BASH_REMATCH[1]}"  "${BASH_REMATCH[2]//[[:blank:]]}"
done < <(grep -h "Cross-Entropy (masked data):" *.log)

Or) Use grep with PCRE flag -P option, and xargs to filter output.

grep -Pho '\.recode\.\K\d+|: \K.*' *.log | xargs -n2 -d'\n'
2   0.556984
9   0.996984
7   0.756984

(or) Use a much simpler perl regEx syntax.

perl -lne 'print "$1 $2" if /\.recode\.(\d+).*:\s+(.*)/' *.log
2 0.556984
9 0.996984
7 0.756984

Upvotes: 2

ghoti

Reputation: 46856

I think I'd do this using awk rather than parsing the output of grep.

I don't have your dataset to test this on, but it seems to me that the following should work.

awk '/^Cross-Entropy \(masked data\):/ {split(FILENAME,a,".");printf("%s\t%s\n", a[3], $NF}' *.log

It's a bit long as a one-liner. As a standalone script, it might look like this:

#!/usr/bin/awk -f

/^Cross-Entropy \(masked data\):/ {
  split(FILENAME,a,".")
  printf("%s\t%s\n", a[3], $NF
}

Save this in a file, make it executable, and you have yourself a brand new shell command.

Note that this works by using field splitting, NOT by using a regex.

Upvotes: 0

slice string with multiple delimiters

Answers (3)

Related Questions