Reputation: 85
I'm trying to make a list containing 25 different passwords to check against another list of 50, and come back with the matches. This is for a university project on passwords. The idea is the list of 25 are the most commonly used passwords, and I would like R to tell me which of my 50 passwords match the most common 25. However I keep receiving the following error:
Error in $<-.data.frame(*tmp*, "Percent", value = character(0)) :
replacement has 0 rows, data has 25
I am using the following code
makeCounts <- function(x) {
return(x=list("count"=sum(grepl(x, Final_DF$pswd, ignore.case=TRUE))))
}
#creates a local variable named tmp which is removed afterwards
printCounts <- function(ct) {
tmp <- data.frame(Term=names(ct), Count=as.numeric(unlist(ct)))
tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF$Pswd) * 100)))
print(tmp[order(-tmp$Count),], row.names=FALSE)
}
# create top 25 mostly commonly used pswds
worst.pass <- c("password", "123456", "12345678", "qwerty", "abc123",
"monkey", "1234567", "Qwertyuiop", "123", "dragon",
"000000", "1111111", "iloveyou", "1234", "12345",
"1234567890", "1q2w3e4r5t", "ashely", "shadow", "123123",
"654321", "superman", "sunshine", "tinkle", "football")
worst.ct <- sapply(worst.pass, makeCounts, simplify=FALSE)
printCounts(worst.ct)
The data containing my 50 passwords are is contained in my data frame Final_DF$Pswd and is as follows
> Final_DF$Pswd
[1] "monkey" "iloveyou" "dragon" "jbI2pnK$xi" "password" "computer" "!qessw"
[8] "tUNh&SSm6!" "sunshine" "wYrUeWV" "superman" "samsung" "utoXGe6$" "master"
[15] "wjZC&OvXX" "0R1cNTm9sGir" "Fbuu2bs89?" "pokemon" "secret" "x&W1TjO59" "buster"
[22] "purple" "shine" "flower" "marina" "Tg%OQT$0" "SbDUV&nOX" "peanut"
[29] "angel" "?1LOEc4Zfk" "computer" "spiderman" "nothing" "$M6LgmQgv$" "orange"
[36] "knight" "american" "outback" "TfuRpt3PiZ" "air" "surf" "lEi2a$$eyz"
[43] "date" "V$683rx$p" "newcastle" "estate" "foxy" "ginger" "coffee"
[50] "legs"
Show traceback of the error when I run printCounts(worst.ct)
reads
Error in `$<-.data.frame`(`*tmp*`, "Percent", value = character(0)) :
replacement has 0 rows, data has 25
4.
stop(sprintf(ngettext(N, "replacement has %d row, data has %d",
"replacement has %d rows, data has %d"), N, nrows), domain = NA)
3.
`$<-.data.frame`(`*tmp*`, "Percent", value = character(0))
2.
`$<-`(`*tmp*`, "Percent", value = character(0))
1.
printCounts(worst.ct)
I have read a couple of forum posts, and I am not sure if this has something to do with NA values? I am new to R and been looking at this for some time scratching my head.
Can anybody please tell me where I am going wrong?
> dput(Final_DF)
structure(list(gender = c("female", "male", "male", "female",
"female", "male", "male", "male", "male", "female", "male", "male",
"female", "female", "female", "female", "male", "female", "male",
"male", "female", "female", "female", "female", "female", "female",
"male", "female", "female", "female", "female", "female", "female",
"female", "male", "male", "female", "female", "male", "female",
"female", "male", "female", "female", "male", "male", "male",
"male", "male", "male"), age = structure(c(47L, 43L, 65L, 24L,
44L, 60L, 26L, 25L, 62L, 23L, 44L, 61L, 27L, 47L, 18L, 23L, 34L,
77L, 71L, 19L, 64L, 61L, 22L, 55L, 45L, 29L, 21L, 64L, 43L, 20L,
32L, 55L, 68L, 21L, 81L, 43L, 63L, 72L, 38L, 20L, 66L, 39L, 64L,
20L, 73L, 21L, 53L, 75L, 69L, 82L), class = c("variable", "integer"
), varname = "Age"), web_browser = structure(c(1L, 1L, 4L, 1L,
3L, 3L, 2L, 1L, 4L, 1L, 1L, 1L, 3L, 4L, 1L, 2L, 1L, 3L, 3L, 2L,
1L, 1L, 1L, 3L, 4L, 3L, 4L, 4L, 1L, 2L, 1L, 1L, 3L, 1L, 1L, 2L,
1L, 2L, 3L, 4L, 2L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 4L, 1L), .Label = c("Chrome",
"Internet Explorer", "Firefox", "Netscape"), class = c("variable",
"factor"), varname = "Browser"), Pswd = c("monkey", "iloveyou",
"dragon", "jbI2pnK$xi", "password", "computer", "!qessw", "tUNh&SSm6!",
"sunshine", "wYrUeWV", "superman", "samsung", "utoXGe6$", "master",
"wjZC&OvXX", "0R1cNTm9sGir", "Fbuu2bs89?", "pokemon", "secret",
"x&W1TjO59", "buster", "purple", "shine", "flower", "marina",
"Tg%OQT$0", "SbDUV&nOX", "peanut", "angel", "?1LOEc4Zfk", "computer",
"spiderman", "nothing", "$M6LgmQgv$", "orange", "knight", "american",
"outback", "TfuRpt3PiZ", "air", "surf", "lEi2a$$eyz", "date",
"V$683rx$p", "newcastle", "estate", "foxy", "ginger", "coffee",
"legs"), pswd_length = c(6L, 8L, 6L, 10L, 8L, 8L, 6L, 10L, 8L,
7L, 8L, 7L, 8L, 6L, 9L, 12L, 10L, 7L, 6L, 9L, 6L, 6L, 5L, 6L,
6L, 8L, 9L, 6L, 5L, 10L, 8L, 9L, 7L, 10L, 6L, 6L, 8L, 7L, 10L,
3L, 4L, 10L, 4L, 9L, 9L, 6L, 4L, 6L, 6L, 4L), last.num = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 9, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA)), row.names = c(NA, -50L), class = "data.frame")
Upvotes: 1
Views: 5182
Reputation: 4419
If you only want to check whether a (set of) password(s) is in a set of bad passwords, you could use
Final_DF$Pswd %in% worst.pass
This will give you a vector of TRUE
or FALSE
. you could run sum(Final_DF$Pswd %in% worst.pass)
to get the total number of bad password matches, or table(Final_DF$Pswd[Final_DF$Pswd %in% worst.pass])
for a quick overview of matches.
However, if your intention is to check a set where passwords are constantly added (which I'm guessing is the intention, since you made the functions), the following might be useful:
result <- c()
for (i in 1:length(Final_DF$Pswd)) {
if (Final_DF$Pswd[i] %in% worst.pass) {
result[i] <- which(worst.pass == Final_DF$Pswd[i])
} else
result[i] <- NA
}
table(worst.pass[result[!is.na(result)]])
The results is a table with the count of the matches. In your case,
dragon iloveyou monkey password sunshine superman
1 1 1 1 1 1
Note that for large amount of passwords looping is not advisable. In that case, neat tidyverse
approaches would be worth looking at.
Upvotes: 1
Reputation: 160397
There are several things that appear wrong with your functions.
makeCounts
is referencing pswd
, but Final_DF
has Pswd
and pswd_length
. R is doing a partial match for, and I'm guessing that it is not the one you want. Let's prove what it is using, first by setting an option[1]:
options(warnPartialMatchDollar = TRUE) # see ?options
worst.ct <- sapply(worst.pass, makeCounts, simplify=FALSE)
# Warning in Final_DF$pswd : partial match of 'pswd' to 'pswd_length'
# Warning: partial match of 'pswd' to 'pswd_length'
# Warning: partial match of 'pswd' to 'pswd_length'
# Warning: partial match of 'pswd' to 'pswd_length'
# Warning: partial match of 'pswd' to 'pswd_length'
### ...repeated...
Worse, if you look at this variable (part of troubleshooting your problem is to check the variables you are making and using), you'll see that it is effectively empty/useless, where all values are 0
:
str(worst.ct)
# List of 25
# $ password :List of 1
# ..$ count: int 0
# $ 123456 :List of 1
# ..$ count: int 0
# $ 12345678 :List of 1
# ..$ count: int 0
# $ qwerty :List of 1
# ..$ count: int 0
### ...truncated...
If you change your function to use the correct column name, it provides no such warning, and it does contain some non-zero elements:
makeCounts <- function(x) {
return(x=list("count"=sum(grepl(x, Final_DF$Pswd, ignore.case=TRUE))))
}
table(unlist(worst.ct))
# 0 1
# 19 6
str(worst.ct)
# List of 25
# $ password :List of 1
# ..$ count: int 1
# $ 123456 :List of 1
# ..$ count: int 0
# $ 12345678 :List of 1
# ..$ count: int 0
# $ qwerty :List of 1
# ..$ count: int 0
### ...truncated...
Within your printCounts
function, you are referencing nrow(Final_DF$Pswd)
, which is always going to produce NULL
. Have you tried this?
nrow(Final_DF$Pswd)
# NULL
nrow(Final_DF)
# [1] 50
Instead, rewrite that line to be
tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF) * 100)))
Not a syntax error, but your function relying on a variable that is neither defined within it nor passed to it is bad practice: it means the function can behave differently when the same parameters are passed to it, which breaks reproducibility (and it can make troubleshooting rather difficult).
I suggest making Final_DF
an argument for the function, and passing it every time.
printCounts <- function(ct, Final_DF) {
tmp <- data.frame(Term=names(ct), Count=as.numeric(unlist(ct)))
tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF) * 100)))
print(tmp[order(-tmp$Count),], row.names=FALSE)
}
printCounts(worst.ct)
# Error in nrow(Final_DF) : argument "Final_DF" is missing, with no default
printCounts(worst.ct, Final_DF) # no error here
For this case, I'm recommending that you do not provide a default value for it. This also enabled you to use the same function with different "final" frames of passwords, in case you are testing (unit-testing) or testing (train/test sampling) or testing (troubleshooting).
After those changes, I get this:
printCounts(worst.ct, Final_DF)
# Term Count Percent
# password 1 2.00%
# monkey 1 2.00%
# dragon 1 2.00%
# iloveyou 1 2.00%
# superman 1 2.00%
# sunshine 1 2.00%
# 123456 0 0.00%
# 12345678 0 0.00%
# qwerty 0 0.00%
# abc123 0 0.00%
# 1234567 0 0.00%
# Qwertyuiop 0 0.00%
# 123 0 0.00%
# 000000 0 0.00%
# 1111111 0 0.00%
# 1234 0 0.00%
# 12345 0 0.00%
# 1234567890 0 0.00%
# 1q2w3e4r5t 0 0.00%
# ashely 0 0.00%
# shadow 0 0.00%
# 123123 0 0.00%
# 654321 0 0.00%
# tinkle 0 0.00%
# football 0 0.00%
Note:
I have options(warnPartialMatchDollar=TRUE, warnPartialMatchAttr=TRUE)
set in my ~/.Rprofile
(and any project-specific .Rprofile
init file) for just this reason: the $
silently does partial matching, and this can be very problematic. With the warning, at least you can see what R is inferring in the background. There is a third option, warnPartialMatchArgs
, that has the same intent ... but waaaaaaaaaay too many package authors out there are inadvertently relying on this behavior, so lacking the time/ability to fix them all, I have chosen to muffle this noise-maker.
Especially if this partial-matching behavior is a surprise to you, I strongly encourage you to set the first two options yourself. In the best-case, it produces no warnings and you have the comfort of knowing that you are taking steps to produce more resilient code; at worst, it is noisy and you eventually get tired of the noise and fix the lazy code.
See ?options
for these three among many other available options. (Packages can set their own options as well; an option is similar in concept to Windows' registry, for better or worse, in that it is global to R, and can have arbitrary keys and values.)
Upvotes: 2