chopin_is_the_best
chopin_is_the_best

Reputation: 2111

For loop for multiple indices

I know that in R for loops should be avoided and vectorized operations should be used instead.

I want to solve this with a for loop and then try to use the apply family, then also in Rcpp.

I load a dataset containing one column of passwords (alphanumeric).

Once loaded (a sample, for speed), I want to create new column with value (0,1) based on some conditions "contains_lower_chars", "contains_numbers" and so on.

Here what I tried to do, but it doesn't work - meaning each column I create has the same value.

library(tidyverse)
set.seed(123)
# load dataset from url, skip the first 16 rows
df <- read.csv('http://datashaping.com/passwords.txt', header = F, skip = 16) %>%
  sample_frac(.001) %>% 
  rename(password = V1)

patterns = c("[a-z]","[A-Z]","[0-9]+")

df$has_lower <- 0 
df$has_upper <- 0
df$has_numeric <- 0

for(i in 1:nrow(df)){
    for(j in patterns){
        n <- ifelse(grepl(j, df$password[i]),1,0)
        }
    df$has_lower[i] <- n
    df$has_upper[i] <- n 
    df$has_numeric[i] <- n
}

Output I have in mind is:

password has_lower has_upper has_numeric
Bigmaccas   1         1       0
0127515559  0         0       1
dbqky73p    1         0       1

Upvotes: 3

Views: 1351

Answers (3)

Frostic
Frostic

Reputation: 680

First you need to update has.lower has.upper and has.numeric within the j loop otherwise your n remains the same for this 3 cases. To do so you need to be able to loop over the names of the columns has.lower has.upper and has.numeric:

names <- c("has_lower","has_upper","has_numeric")

for(i in 1:nrow(df)){
  for(j in 1:length(patterns)){
    df[i,(names[j])] <- as.numeric(grepl(j, df$password[i]))
  }
}

A quicker, nicer, more compact alternative using apply and the fact that grepl is already vectorized:

df[, c("has_lower","has_upper","has_numeric"):=lapply(patterns, function(x) grepl(x,df$password))]

Note (nothing to do with your question):

I advise you to use the fread function to read your dataset since it is quite large.

df = fread('http://datashaping.com/passwords.txt', header = F, skip = 16)%>%
  sample_frac(.001) %>% 
  rename(password = V1)

Upvotes: 0

F. Priv&#233;
F. Priv&#233;

Reputation: 11738

A data frame is above all a list.

So, you can simply do:

df[c("has_lower", "has_upper", "has_numeric")] <- 
  lapply(patterns, function(pattern) grepl(pattern, df$password) + 0)

Use + 0L instead of + 0 is you want integers instead of doubles (I would recommend to do nothing and to keep logicals).

Upvotes: 0

MrFlick
MrFlick

Reputation: 206606

We can simplify things if we just name your pattern vector. For example

patterns = c(has_lower="[a-z]",
             has_upper="[A-Z]",
             has_numeric="[0-9]+")

for(pattern in names(patterns)) {
  df[, pattern] = as.numeric(grepl(patterns[pattern], df$password))
}

Basically we just loop through each of the names, grab the regular expression corresponding to that name, then do the matching and adding the column.

Upvotes: 1

Related Questions