Marty
Marty

Reputation: 85

Replacement has 0 rows, data has 25 error

I'm trying to make a list containing 25 different passwords to check against another list of 50, and come back with the matches. This is for a university project on passwords. The idea is the list of 25 are the most commonly used passwords, and I would like R to tell me which of my 50 passwords match the most common 25. However I keep receiving the following error:

Error in $<-.data.frame(*tmp*, "Percent", value = character(0)) :
replacement has 0 rows, data has 25

I am using the following code

makeCounts <- function(x) {
  return(x=list("count"=sum(grepl(x, Final_DF$pswd, ignore.case=TRUE))))  
}

#creates a local variable named tmp which is removed afterwards
printCounts <- function(ct) {
  tmp <- data.frame(Term=names(ct), Count=as.numeric(unlist(ct)))
  tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF$Pswd) * 100)))
  print(tmp[order(-tmp$Count),], row.names=FALSE)
}

# create top 25 mostly commonly used pswds

worst.pass <- c("password", "123456", "12345678", "qwerty", "abc123", 
                "monkey", "1234567", "Qwertyuiop", "123", "dragon", 
                "000000", "1111111", "iloveyou", "1234", "12345", 
                "1234567890", "1q2w3e4r5t", "ashely", "shadow", "123123", 
                "654321", "superman", "sunshine", "tinkle", "football")

worst.ct <- sapply(worst.pass, makeCounts, simplify=FALSE)
printCounts(worst.ct)

The data containing my 50 passwords are is contained in my data frame Final_DF$Pswd and is as follows

> Final_DF$Pswd
 [1] "monkey"       "iloveyou"     "dragon"       "jbI2pnK$xi"   "password"     "computer"     "!qessw"      
 [8] "tUNh&SSm6!"   "sunshine"     "wYrUeWV"      "superman"     "samsung"      "utoXGe6$"     "master"      
[15] "wjZC&OvXX"    "0R1cNTm9sGir" "Fbuu2bs89?"   "pokemon"      "secret"       "x&W1TjO59"    "buster"      
[22] "purple"       "shine"        "flower"       "marina"       "Tg%OQT$0"     "SbDUV&nOX"    "peanut"      
[29] "angel"        "?1LOEc4Zfk"   "computer"     "spiderman"    "nothing"      "$M6LgmQgv$"   "orange"      
[36] "knight"       "american"     "outback"      "TfuRpt3PiZ"   "air"          "surf"         "lEi2a$$eyz"  
[43] "date"         "V$683rx$p"    "newcastle"    "estate"       "foxy"         "ginger"       "coffee"      
[50] "legs" 

Show traceback of the error when I run printCounts(worst.ct) reads

 Error in `$<-.data.frame`(`*tmp*`, "Percent", value = character(0)) : 
  replacement has 0 rows, data has 25 
4.
stop(sprintf(ngettext(N, "replacement has %d row, data has %d", 
    "replacement has %d rows, data has %d"), N, nrows), domain = NA) 
3.
`$<-.data.frame`(`*tmp*`, "Percent", value = character(0)) 
2.
`$<-`(`*tmp*`, "Percent", value = character(0)) 
1.
printCounts(worst.ct) 

I have read a couple of forum posts, and I am not sure if this has something to do with NA values? I am new to R and been looking at this for some time scratching my head.

Can anybody please tell me where I am going wrong?

> dput(Final_DF)
structure(list(gender = c("female", "male", "male", "female", 
"female", "male", "male", "male", "male", "female", "male", "male", 
"female", "female", "female", "female", "male", "female", "male", 
"male", "female", "female", "female", "female", "female", "female", 
"male", "female", "female", "female", "female", "female", "female", 
"female", "male", "male", "female", "female", "male", "female", 
"female", "male", "female", "female", "male", "male", "male", 
"male", "male", "male"), age = structure(c(47L, 43L, 65L, 24L, 
44L, 60L, 26L, 25L, 62L, 23L, 44L, 61L, 27L, 47L, 18L, 23L, 34L, 
77L, 71L, 19L, 64L, 61L, 22L, 55L, 45L, 29L, 21L, 64L, 43L, 20L, 
32L, 55L, 68L, 21L, 81L, 43L, 63L, 72L, 38L, 20L, 66L, 39L, 64L, 
20L, 73L, 21L, 53L, 75L, 69L, 82L), class = c("variable", "integer"
), varname = "Age"), web_browser = structure(c(1L, 1L, 4L, 1L, 
3L, 3L, 2L, 1L, 4L, 1L, 1L, 1L, 3L, 4L, 1L, 2L, 1L, 3L, 3L, 2L, 
1L, 1L, 1L, 3L, 4L, 3L, 4L, 4L, 1L, 2L, 1L, 1L, 3L, 1L, 1L, 2L, 
1L, 2L, 3L, 4L, 2L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 4L, 1L), .Label = c("Chrome", 
"Internet Explorer", "Firefox", "Netscape"), class = c("variable", 
"factor"), varname = "Browser"), Pswd = c("monkey", "iloveyou", 
"dragon", "jbI2pnK$xi", "password", "computer", "!qessw", "tUNh&SSm6!", 
"sunshine", "wYrUeWV", "superman", "samsung", "utoXGe6$", "master", 
"wjZC&OvXX", "0R1cNTm9sGir", "Fbuu2bs89?", "pokemon", "secret", 
"x&W1TjO59", "buster", "purple", "shine", "flower", "marina", 
"Tg%OQT$0", "SbDUV&nOX", "peanut", "angel", "?1LOEc4Zfk", "computer", 
"spiderman", "nothing", "$M6LgmQgv$", "orange", "knight", "american", 
"outback", "TfuRpt3PiZ", "air", "surf", "lEi2a$$eyz", "date", 
"V$683rx$p", "newcastle", "estate", "foxy", "ginger", "coffee", 
"legs"), pswd_length = c(6L, 8L, 6L, 10L, 8L, 8L, 6L, 10L, 8L, 
7L, 8L, 7L, 8L, 6L, 9L, 12L, 10L, 7L, 6L, 9L, 6L, 6L, 5L, 6L, 
6L, 8L, 9L, 6L, 5L, 10L, 8L, 9L, 7L, 10L, 6L, 6L, 8L, 7L, 10L, 
3L, 4L, 10L, 4L, 9L, 9L, 6L, 4L, 6L, 6L, 4L), last.num = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, 9, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA)), row.names = c(NA, -50L), class = "data.frame")

Upvotes: 1

Views: 5182

Answers (2)

Donald Seinen
Donald Seinen

Reputation: 4419

If you only want to check whether a (set of) password(s) is in a set of bad passwords, you could use

Final_DF$Pswd %in% worst.pass

This will give you a vector of TRUE or FALSE. you could run sum(Final_DF$Pswd %in% worst.pass) to get the total number of bad password matches, or table(Final_DF$Pswd[Final_DF$Pswd %in% worst.pass]) for a quick overview of matches.

However, if your intention is to check a set where passwords are constantly added (which I'm guessing is the intention, since you made the functions), the following might be useful:

result <- c()
for (i in 1:length(Final_DF$Pswd)) {
    if (Final_DF$Pswd[i] %in% worst.pass) {
        result[i] <- which(worst.pass == Final_DF$Pswd[i])
    } else
        result[i] <- NA
}
table(worst.pass[result[!is.na(result)]])

The results is a table with the count of the matches. In your case,

  dragon iloveyou   monkey password sunshine superman 
       1        1        1        1        1        1 

Note that for large amount of passwords looping is not advisable. In that case, neat tidyverseapproaches would be worth looking at.

Upvotes: 1

r2evans
r2evans

Reputation: 160397

There are several things that appear wrong with your functions.

  1. makeCounts is referencing pswd, but Final_DF has Pswd and pswd_length. R is doing a partial match for, and I'm guessing that it is not the one you want. Let's prove what it is using, first by setting an option[1]:

    options(warnPartialMatchDollar = TRUE) # see ?options
    worst.ct <- sapply(worst.pass, makeCounts, simplify=FALSE)
    # Warning in Final_DF$pswd : partial match of 'pswd' to 'pswd_length'
    # Warning: partial match of 'pswd' to 'pswd_length'
    # Warning: partial match of 'pswd' to 'pswd_length'
    # Warning: partial match of 'pswd' to 'pswd_length'
    # Warning: partial match of 'pswd' to 'pswd_length'
    ### ...repeated...
    

    Worse, if you look at this variable (part of troubleshooting your problem is to check the variables you are making and using), you'll see that it is effectively empty/useless, where all values are 0:

    str(worst.ct)
    # List of 25
    #  $ password  :List of 1
    #   ..$ count: int 0
    #  $ 123456    :List of 1
    #   ..$ count: int 0
    #  $ 12345678  :List of 1
    #   ..$ count: int 0
    #  $ qwerty    :List of 1
    #   ..$ count: int 0
    ### ...truncated...
    

    If you change your function to use the correct column name, it provides no such warning, and it does contain some non-zero elements:

    makeCounts <- function(x) {
      return(x=list("count"=sum(grepl(x, Final_DF$Pswd, ignore.case=TRUE))))  
    }
    table(unlist(worst.ct))
    #  0  1 
    # 19  6 
    
    str(worst.ct)
    # List of 25
    #  $ password  :List of 1
    #   ..$ count: int 1
    #  $ 123456    :List of 1
    #   ..$ count: int 0
    #  $ 12345678  :List of 1
    #   ..$ count: int 0
    #  $ qwerty    :List of 1
    #   ..$ count: int 0
    ### ...truncated...
    
  2. Within your printCounts function, you are referencing nrow(Final_DF$Pswd), which is always going to produce NULL. Have you tried this?

    nrow(Final_DF$Pswd)
    # NULL
    nrow(Final_DF)
    # [1] 50
    

    Instead, rewrite that line to be

      tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF) * 100)))
    
  3. Not a syntax error, but your function relying on a variable that is neither defined within it nor passed to it is bad practice: it means the function can behave differently when the same parameters are passed to it, which breaks reproducibility (and it can make troubleshooting rather difficult).

    I suggest making Final_DF an argument for the function, and passing it every time.

    printCounts <- function(ct, Final_DF) {
      tmp <- data.frame(Term=names(ct), Count=as.numeric(unlist(ct)))
      tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF) * 100)))
      print(tmp[order(-tmp$Count),], row.names=FALSE)
    }
    
    printCounts(worst.ct)
    # Error in nrow(Final_DF) : argument "Final_DF" is missing, with no default
    
    printCounts(worst.ct, Final_DF) # no error here
    

    For this case, I'm recommending that you do not provide a default value for it. This also enabled you to use the same function with different "final" frames of passwords, in case you are testing (unit-testing) or testing (train/test sampling) or testing (troubleshooting).

After those changes, I get this:

printCounts(worst.ct, Final_DF)
#        Term Count Percent
#    password     1   2.00%
#      monkey     1   2.00%
#      dragon     1   2.00%
#    iloveyou     1   2.00%
#    superman     1   2.00%
#    sunshine     1   2.00%
#      123456     0   0.00%
#    12345678     0   0.00%
#      qwerty     0   0.00%
#      abc123     0   0.00%
#     1234567     0   0.00%
#  Qwertyuiop     0   0.00%
#         123     0   0.00%
#      000000     0   0.00%
#     1111111     0   0.00%
#        1234     0   0.00%
#       12345     0   0.00%
#  1234567890     0   0.00%
#  1q2w3e4r5t     0   0.00%
#      ashely     0   0.00%
#      shadow     0   0.00%
#      123123     0   0.00%
#      654321     0   0.00%
#      tinkle     0   0.00%
#    football     0   0.00%

Note:

  1. I have options(warnPartialMatchDollar=TRUE, warnPartialMatchAttr=TRUE) set in my ~/.Rprofile (and any project-specific .Rprofile init file) for just this reason: the $ silently does partial matching, and this can be very problematic. With the warning, at least you can see what R is inferring in the background. There is a third option, warnPartialMatchArgs, that has the same intent ... but waaaaaaaaaay too many package authors out there are inadvertently relying on this behavior, so lacking the time/ability to fix them all, I have chosen to muffle this noise-maker.

    Especially if this partial-matching behavior is a surprise to you, I strongly encourage you to set the first two options yourself. In the best-case, it produces no warnings and you have the comfort of knowing that you are taking steps to produce more resilient code; at worst, it is noisy and you eventually get tired of the noise and fix the lazy code.

    See ?options for these three among many other available options. (Packages can set their own options as well; an option is similar in concept to Windows' registry, for better or worse, in that it is global to R, and can have arbitrary keys and values.)

Upvotes: 2

Related Questions