M00N KNIGHT
M00N KNIGHT

Reputation: 137

For loop to iterate through dplyr pipe

I'm trying to get the total number of entries of each row in a dataframe in order to compress on these fields later.

However the dataframe has over 60 rows and writing the below 60 times is extremely inefficient

df %>%
    group_by(colname) %>%
    count() %>%
    arrange(desc(n))

Is there a way I can write a for loop to loop through all the names in the dataframe and produce the pipe function result for each? I tried

for (i in colnames(df)) {

df %>%
    group_by(colname) %>%
    count() %>%
    arrange(desc(n))

}

But I'm getting an 'i is unknown' error. Any help would be appreciated thanks.

Upvotes: 0

Views: 2744

Answers (2)

mabreitling
mabreitling

Reputation: 606

If I understand correctly you want to count the number of occurrences of the unique elements in every single column or did I get that completely wrong? Why are you not just using a combination of some apply function and table?

set.seed(101)
df <- data.frame("x" = 1:20, "y" = LETTERS[sample(1:26, 20, replace = TRUE)], "z" = letters[sample(1:26, 20, replace = TRUE)])
l <- sapply(df, table)
lapply(l, sort, decreasing = T)

Upvotes: 2

Duck
Duck

Reputation: 39595

You can try this:

#Data
df <- iris
#Create list
List <- list()
#Compute
for (colname in colnames(df)) {
  
  List[[colname]]<- df %>%
    group_by(df[,colname]) %>%
    count() %>%
    arrange(desc(n))
  
}
#Print
List

$Sepal.Length
# A tibble: 35 x 2
# Groups:   df[, colname] [35]
   `df[, colname]`     n
                   <dbl> <int>
 1                   5      10
 2                   5.1     9
 3                   6.3     9
 4                   5.7     8
 5                   6.7     8
 6                   5.5     7
 7                   5.8     7
 8                   6.4     7
 9                   4.9     6
10                   5.4     6
# ... with 25 more rows

$Sepal.Width
# A tibble: 23 x 2
# Groups:   df[, colname] [23]
   `df[, colname]`     n
                   <dbl> <int>
 1                   3      26
 2                   2.8    14
 3                   3.2    13
 4                   3.4    12
 5                   3.1    11
 6                   2.9    10
 7                   2.7     9
 8                   2.5     8
 9                   3.3     6
10                   3.5     6
# ... with 13 more rows

$Petal.Length
# A tibble: 43 x 2
# Groups:   df[, colname] [43]
   `df[, colname]`     n
                   <dbl> <int>
 1                   1.4    13
 2                   1.5    13
 3                   4.5     8
 4                   5.1     8
 5                   1.3     7
 6                   1.6     7
 7                   5.6     6
 8                   4       5
 9                   4.7     5
10                   4.9     5
# ... with 33 more rows

$Petal.Width
# A tibble: 22 x 2
# Groups:   df[, colname] [22]
   `df[, colname]`     n
                   <dbl> <int>
 1                   0.2    29
 2                   1.3    13
 3                   1.5    12
 4                   1.8    12
 5                   1.4     8
 6                   2.3     8
 7                   0.3     7
 8                   0.4     7
 9                   1       7
10                   2       6
# ... with 12 more rows

$Species
# A tibble: 3 x 2
# Groups:   df[, colname] [3]
  `df[, colname]`     n
  <fct>                 <int>
1 setosa                   50
2 versicolor               50
3 virginica                50

Upvotes: 1

Related Questions