Reputation: 19
I have the following code to choose 2 random lines from a file containing lines of the form ip:port
:
import (
"os"
"fmt"
"math/rand"
"log"
"time"
"unicode/utf8"
//"bufio"
)
func main() {
fmt.Println("num bytes in line is: \n", utf8.RuneCountInString("10.244.1.8:8080"))
file_pods_array, err_file_pods_array := os.Open("pods_array.txt")
if err_file_pods_array != nil {
log.Fatalf("failed opening file: %s", err_file_pods_array)
}
//16 = num of bytes in ip:port pair
randsource := rand.NewSource(time.Now().UnixNano())
randgenerator := rand.New(randsource)
firstLoc := randgenerator.Intn(10)
secondLoc := randgenerator.Intn(10)
candidate1 := ""
candidate2 := ""
num_bytes_from_start_first := 16 * (firstLoc + 1)
num_bytes_from_start_second := 16 * (secondLoc + 1)
buf_ipport_first := make([]byte, int64(15))
buf_ipport_second := make([]byte, int64(15))
start_first := int64(num_bytes_from_start_first)
start_second := int64(num_bytes_from_start_second)
_, err_first := file_pods_array.ReadAt(buf_ipport_first, start_first)
first_ipport_ep := buf_ipport_first
if err_first == nil {
candidate1 = string(first_ipport_ep)
}
_, err_second := file_pods_array.ReadAt(buf_ipport_second, start_second)
second_ipport_ep := buf_ipport_second
if err_second == nil {
candidate2 = string(second_ipport_ep)
}
fmt.Println("first is: ", candidate1)
fmt.Println("sec is: ", candidate2)
}
This sometimes prints empty or partial lines. Why does this happen and how can I fix it?
Output example:
num bytes in line is:
15
first is: 10.244.1.17:808
sec is:
10.244.1.11:80
Thank you.
Upvotes: 0
Views: 1846
Reputation: 164769
If your lines were of a fixed length you could do this in constant time.
But you don't have fixed length lines. We can't do constant time, but we can do it in constant memory and O(n) time using the technique from The Art of Computer Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.
That is, as you read each line you have a 1/M chance of picking it. Cumulatively this adds up to 1/N for every line.
If we have three lines, the first line has a 1/1 chance of being picked. Then a 1/2 chance of remaining. Then a 2/3 chance of remaining. Total chance: 1 * 1/2 * 2/3 = 1/3.
The second line has a 1/2 chance of being picked and a 2/3 chance of remaining. Total chance: 1/2 * 2/3 = 1/3.
The third line has a 1/3 chance of being picked.
package main
import(
"bufio"
"fmt"
"os"
"log"
"math/rand"
"time"
);
func main() {
file, err := os.Open("pods_array.txt")
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
randsource := rand.NewSource(time.Now().UnixNano())
randgenerator := rand.New(randsource)
lineNum := 1
var pick string
for scanner.Scan() {
line := scanner.Text()
fmt.Printf("Considering %v at 1/%v.\n", scanner.Text(), lineNum)
// Instead of 1 to N it's 0 to N-1
roll := randgenerator.Intn(lineNum)
fmt.Printf("We rolled a %v.\n", roll)
if roll == 0 {
fmt.Printf("Picking line.\n")
pick = line
}
lineNum += 1
}
fmt.Printf("Picked: %v\n", pick)
}
Because rand.Intn(n)
returns [0,n)
, that is from 0 to n-1, we check for 0, not 1.
Maybe you're thinking "what if I seek to a random point in the file and then read the next full line?" That wouldn't quite be constant time, it would beO(longest-line), but it wouldn't be truly random. Longer lines would get picked more frequently.
Note that since these are (I assume) all IP addresses and ports you could have constant record lengths. Store the IPv4 address as a 32 bits and the port as a 16 bits. 48 bits per line.
However, this will break on IPv6. For forward compatibility store everything as IPv6: 128 bits for the IP and 16 bits for the port. 144 bits per line. Convert IPv4 addresses to IPv6 for storage.
This will allow you to pick random addresses in constant time, and it will save disk space.
Alternatively, store them in SQLite.
Upvotes: 1
Reputation: 19
found a solution using ioutil
and strings
:
func main() {
randsource := rand.NewSource(time.Now().UnixNano())
randgenerator := rand.New(randsource)
firstLoc := randgenerator.Intn(10)
secondLoc := randgenerator.Intn(10)
candidate1 := ""
candidate2 := ""
dat, err := ioutil.ReadFile("pods_array.txt")
if err == nil {
ascii := string(dat)
splt := strings.Split(ascii, "\n")
candidate1 = splt[firstLoc]
candidate2 = splt[secondLoc]
}
fmt.Println(candidate1)
fmt.Println(candidate2)
}
Output
10.244.1.3:8080
10.244.1.11:8080
Upvotes: 0