Reputation: 1631
Given a text file with strings, I would like to draw at random lines with replacement (with repetition).
I know that one can efficiently shuffle the lines with the "shuf" command. What would be the standard linux command-line tools to draw the lines with repetition?
My current approach is a Python script that basically generates random numbers in a range [1,N], where N is the number of lines. The generated random number (integers) are used to index the list of strings and then print.
Here is my Python script:
1 #!/usr/bin/env python
2
3 from random import random
4 import sys
5
6 fname = sys.argv[1]
7
8 with open( fname, 'r' ) as f:
9 lines = f.readlines()
10 lines = [ s.strip("\n") for s in lines ]
11
12 nlines = len( lines )
13
14 for i in range( nlines ):
15 idx = round(random()*nlines)
16 idx = int( idx )
17 print lines[ idx ]
The sample file is:
a
b
c
d
e
f
g
h
And the result of running the script on the sample is:
c
b
f
b
c
c
b
d
Upvotes: 0
Views: 98
Reputation: 113864
Modern versions of shuf
offer a -r
option for repeat. For example:
$ cat input
1
2
3
4
5
$ shuf -n 5 -r input
3
2
5
3
3
$ shuf --version
shuf (GNU coreutils) 8.23
Earlier versions of shuf
may lack -r
.
awk
$ awk '{a[NR]=$0} END{srand();for (i=1;i<=NR;i++)print a[int(1+NR*rand())]}' input
4
3
1
2
3
Upvotes: 1