Vladislavs Dovgalecs
Vladislavs Dovgalecs

Reputation: 1631

Select lines with repetition using standard command-line tools

Given a text file with strings, I would like to draw at random lines with replacement (with repetition).

I know that one can efficiently shuffle the lines with the "shuf" command. What would be the standard linux command-line tools to draw the lines with repetition?

My current approach is a Python script that basically generates random numbers in a range [1,N], where N is the number of lines. The generated random number (integers) are used to index the list of strings and then print.

Here is my Python script:

  1 #!/usr/bin/env python
  2 
  3 from random import random
  4 import sys
  5 
  6 fname = sys.argv[1]
  7 
  8 with open( fname, 'r' ) as f:
  9         lines = f.readlines()
 10 lines = [ s.strip("\n") for s in lines ]
 11 
 12 nlines = len( lines )
 13 
 14 for i in range( nlines ):
 15         idx = round(random()*nlines)
 16         idx = int( idx )
 17         print lines[ idx ]

The sample file is:

a
b
c
d
e
f
g
h

And the result of running the script on the sample is:

c
b
f
b
c
c
b
d

Upvotes: 0

Views: 98

Answers (1)

John1024
John1024

Reputation: 113864

Modern versions of shuf offer a -r option for repeat. For example:

$ cat input
1
2
3
4
5
$ shuf -n 5 -r input
3
2
5
3
3
$ shuf --version
shuf (GNU coreutils) 8.23

Earlier versions of shuf may lack -r.

Alternative: use awk

$ awk '{a[NR]=$0} END{srand();for (i=1;i<=NR;i++)print a[int(1+NR*rand())]}' input
4
3
1
2
3

Upvotes: 1

Related Questions