unixpipe
unixpipe

Reputation: 75

script to find similar email users

We have a mail server and I am trying to write a script that will find all users with similar names to avoid malicious users from impersonating legitimate users. For example, a legit user may have the name of [email protected] but a malicious user may register as [email protected]. The difference, if you notice carefully, is that I replaced the number 'one' with the letter 'l' (el). So I am trying to write something that can consult my /var/vmail/domain/* and find similar names and alert me (the administrator). I will then take the necessary steps to do what I need. Really appreciate any help.

Upvotes: 0

Views: 161

Answers (1)

Jeff Bowman
Jeff Bowman

Reputation: 95684

One hacky way to do this is to derive "normalized" versions of your usernames, put those in an associative array as keys mapping to the original input, and use those to find problems.

The example I posted below uses bash associative arrays to store the mapping from normalized name to original name, and tr to switch some characters for other characters (and delete other characters entirely).

I'm assuming that your list of users will fit into memory; you'll also need to tweak the mapping of modified and removed characters to hit your favorite balance between effectiveness and false positives. If your list can't fit in memory, you can use a single file or the filesystem to approximate it, but honestly if you're processing that many names you're probably better off with a non-shell programming language.

Input:

doc
dopey
james2014
happy
bashful
grumpy
james20l4
sleepy
james.2014
sneezy

Script:

#!/bin/bash
# stdin: A list of usernames. stdout: Pairs of names that match.
CHARS_TO_REMOVE="._\\- "
CHARS_TO_MAP_FROM="OISZql"
CHARS_TO_MAP_TO="0152g1"

normalize() {
  # stdin: A word. stdout: A modified version of the same word.
  exec tr "$CHARS_TO_MAP_FROM" "$CHARS_TO_MAP_TO" \
      | tr --delete "$CHARS_TO_REMOVE" \
      | tr "A-Z" "a-z"
}

declare -A NORMALIZED_NAMES
while read NAME; do
  NORMALIZED_NAME=$(normalize <<< "$NAME")
  # -n tests for non-empty strings, as it would be if the name were set already.
  if [[ -n ${NORMALIZED_NAMES[$NORMALIZED_NAME]} ]]; then
    # This name has been seen before! Print both of them.
    echo "${NORMALIZED_NAMES[$NORMALIZED_NAME]} $NAME"
  else
    # This name has not been seen before. Store it.
    NORMALIZED_NAMES["$NORMALIZED_NAME"]="$NAME"
  fi
done

Output:

james2014 james20l4
james2014 james.2014

Upvotes: 1

Related Questions