Touff
Touff

Reputation: 235

Generate ID number from a name in bash

Currently I have a bunch of names that are tied to numbers, for example:

Joe Bloggs - 17
John Smith - 23
Paul Smith - 24
Joe Bloggs - 32

Using the name and the number I'd like to generate a random/unique ID made of 4 numbers that also ends with the initial number.

So for example, Joe Bloggs and 17 would make something random/unique like: xxxx17.

Is this possible in bash? Would it be better in some other language?
This would be used on debian and darwin based systems.

Upvotes: 2

Views: 1658

Answers (3)

David C. Rankin
David C. Rankin

Reputation: 84559

You can get very close to doing exactly what you want using the random string generated by $(date +%N) and then selecting 4 digits to use as the first for characters in the new ID. You can choose from the beginning if you want IDs that are closer together, or from the mid part of the string for more randomness. After selecting your random 4, then just keep track of the ones used in an array and check against the array as each new ID is assigned. This overhead is negligible for 10,000 or so IDs:

#!/bin/bash

declare -a used4=0    # array to hold IDs you have assigned
declare -i dupid=0    # a flag to prompt regeneration in case of a dup

while read -r line || [ -n "$line" ]; do
    name=${line% -*}
    id2=${line##* }

    while [ $dupid -eq 0 ]; do
        ns=$(date +%N)          # fill variable with nanoseconds
        fouri=${ns:4:4}         # take 4 integers (mid 4 for better randomness)

        # test for duplicate (this is BASH only test - use loop if portability needed)
        [[ "$fouri" =~ "${used4[@]}" ]] && continue

        newid="${fouri}${id2}"  # contatinate 4ints + orig 2 digit id
        used4+=( "$fouri" )     # add 4ints to used4 array
        dupid=1
    done

    dupid=0                     # reset flag

    printf "%s  =>  %s\n" "$line" "$newid"

done<"$1"

output:

$ bash fourid.sh dat/nameid.dat
Joe Bloggs - 17  =>  762117
John Smith - 23  =>  603623
Paul Smith - 24  =>  210424
Joe Bloggs - 32  =>  504732

Upvotes: 2

clt60
clt60

Reputation: 63932

It is impossible to ensure than 4-digit hash (checksum) would be unique for a set of 10 character long names.

As an alternative, you can try

file="./somefile"
paste  -d"\0\n" <(seq -f "%04g" 9999 | sort -R | head -$(grep -c '' "$file")) <(grep -oP '\d+' "$file")

for better readability

paste  -d"\0\n" <(
    seq -f "%04g" 9999 | gsort -R | head -$(grep -c '' "$file")
) <(
    grep -oP '\d+' "$file"
)

for your input produces something like:

010817
161523
748024
269032

All lines are in the form RRRRXX, where:

  • the RRRR is an guaranteed unique and random number (from the range 0001 up to 9999)
  • the XX is the number from your input

decomposition:

  • seq produces 9999 4-digit numbers (ofc, each number is unique)
  • sort -R sorts the lines in random order (based on their hash, so get unique random numbers)
  • head - from the random list show only first N lines, where the N is the number of lines in your file,
  • the number of lines is counted by grep -c '' (better than wc -l)
  • the grep -oP filters the numbers from your file
  • finally the paste combines the two inputs to the final output
  • the <(..) <(..) is process substitution

Upvotes: 1

allen1
allen1

Reputation: 744

Each name, after you add their number, becomes unique already unless there are two Joe Bloggs 17. In your case, there are two Joe Bloggs, one with 17 and 32. Put those together, you have uniqueness "Joe Bloggs 17" and "Joe Bloggs 32" are not the same. Using this, you can simply assign a number to each name + number pair and remember that number in an associative array (dictionary). No need to be random. When you find a name that isn't already in the dictionary, just keep incrementing the number and, then, associate the new number with the name. If uniqueness is the only goal, then you are in good shape for 10,000 people.

Python is a great language for this, but you can make associative arrays in BASH too.

Upvotes: 1

Related Questions