Abdel
Abdel

Reputation: 6106

Unix - randomly select lines based on column values

I have a file with ~1000 lines that looks like this:

ABC C5A 1
CFD D5G 4
E1E FDF 3
CFF VBV 1
FGH F4R 2
K8K F9F 3
... etc

I would like to select 100 random lines, but with 10 of each third column value (so random 10 lines from all lines with value "1" in column 3, random 10 lines from all lines with value "2" in column 3, etc).

Is this possible using bash?

Upvotes: 4

Views: 832

Answers (2)

user000001
user000001

Reputation: 33327

If you can use awk, you can do the same with a one-liner

sort -R file | awk '{if (count[$3] < 10) {count[$3]++; print $0}}'

Upvotes: 3

dogbane
dogbane

Reputation: 274622

First grep all the files with a certain number, shuffle them and pick the first 10 using shuf -n 10.

for i in {1..10}; do
    grep " ${i}$" file | shuf -n 10
done > randomFile

If you don't have shuf, use sort -R to randomly sort them instead:

for i in {1..10}; do
    grep " ${i}$" file | sort -R | head -10
done > randomFile

Upvotes: 7

Related Questions