Unix - randomly select lines based on column values

Question

I have a file with ~1000 lines that looks like this:

ABC C5A 1
CFD D5G 4
E1E FDF 3
CFF VBV 1
FGH F4R 2
K8K F9F 3
... etc

I would like to select 100 random lines, but with 10 of each third column value (so random 10 lines from all lines with value "1" in column 3, random 10 lines from all lines with value "2" in column 3, etc).

Is this possible using bash?

user000001 · Accepted Answer

If you can use awk, you can do the same with a one-liner

sort -R file | awk '{if (count[$3] < 10) {count[$3]++; print $0}}'

Unix - randomly select lines based on column values

Answers (2)

Related Questions