alvas
alvas

Reputation: 122132

Combining multiple cuts operations into one

I have the input file:

$ cat bleu.out 
BLEU = 16.67, 54.4/26.8/14.9/8.2 (BP=0.813, ratio=0.828, hyp_len=8982, ref_len=10844)
BLEU = 17.56, 55.1/27.6/15.8/9.4 (BP=0.804, ratio=0.821, hyp_len=8905, ref_len=10844)
BLEU = 17.95, 54.4/27.5/15.6/9.1 (BP=0.837, ratio=0.849, hyp_len=9206, ref_len=10844)
BLEU = 19.10, 54.8/28.1/16.3/9.7 (BP=0.860, ratio=0.869, hyp_len=9423, ref_len=10844)
BLEU = 19.29, 53.0/26.6/15.1/8.9 (BP=0.925, ratio=0.928, hyp_len=10058, ref_len=10844)
BLEU = 18.70, 55.7/28.7/16.4/9.4 (BP=0.839, ratio=0.851, hyp_len=9223, ref_len=10844)
BLEU = 18.63, 55.2/28.1/16.3/9.8 (BP=0.834, ratio=0.846, hyp_len=9178, ref_len=10844)
BLEU = 18.41, 54.2/27.4/15.5/9.2 (BP=0.857, ratio=0.867, hyp_len=9398, ref_len=10844)
BLEU = 18.70, 53.7/26.9/15.7/9.3 (BP=0.871, ratio=0.878, hyp_len=9526, ref_len=10844)

But when I need to cut out a certain column, let's say the first column after the first comma, I had to use multiple instances of cut, e.g. :

$ cat bleu.out | cut -f1 -d',' | cut -f3 -d ' '
16.67
17.56
17.95
19.10
19.29
18.70
18.63
18.41
18.70

Is there a way to sequentially order multiple cut criterion in one cut instance? E.g. something like cut-multi.sh -f1 -d',' -f3 -d' '?

If no, what would be other methods to perform the same operation of cut -f1 -d',' | cut -f3 -d' '? Using awk, sed or the likes are also welcomed.

Upvotes: 0

Views: 98

Answers (5)

Claes Wikner
Claes Wikner

Reputation: 1517

awk -F'[ = ,]' '{print $4}' file
16.67
17.56
17.95
19.10
19.29
18.70
18.63
18.41
18.70

Upvotes: 0

Arjun Mathew Dan
Arjun Mathew Dan

Reputation: 5298

Another solution with awk:

awk '{sub(/,$/, "", $3); print $3}' bleu.out

Remove the last , from the 3rd field and print it.

Upvotes: 0

P....
P....

Reputation: 18381

Following solution using grep and perl's lookaround feature. This will print the text between = and first , .

grep -oP '= \K.*?(?=,)' input
16.67
17.56
17.95
19.10
19.29
18.70
18.63
18.41
18.70

Or as suggested to Sundeep:

 grep -oP '= \K[^,]+' input

Upvotes: 3

Benjamin W.
Benjamin W.

Reputation: 52182

With sed:

$ sed 's/^[^=]*= \([^,]*\).*/\1/' bleu.out
16.67
17.56
17.95
19.10
19.29
18.70
18.63
18.41
18.70

This captures all characters that are not a comma up to a comma (\([^,]*\)) after the first occurrence of = (and a space) (^[^=]*=) and substitutes the line with the capture group (\1).

Upvotes: 2

Sundeep
Sundeep

Reputation: 23667

You can specify multiple field separator in awk

$ awk -F'= *|,' '{print $2}' bleu.out
16.67
17.56
17.95
19.10
19.29
18.70
18.63
18.41
18.70
  • -F'= *|,' specifies = followed by zero or more space or , as field separator
  • {print $2} print second column

Upvotes: 4

Related Questions