RoyBatty
RoyBatty

Reputation: 326

Count how many times each character appears in the whole dataset

I have a table with twenty columns and thousands of rows.

Just for example purposes, I will say I have this table:

ColumnA   ColumnB
Testing      This
1231         1231

I want to count how many times each single character appears in the whole dataset.

So in our toy example we would have

character   nºoftimes
T                3
e                1
s                2
i                2
n                1
g                1
h                1
1                4
2                2
3                2

I was thinking of using some kind of string manipulation, but now sure how can I do this.

Upvotes: 1

Views: 119

Answers (5)

Maël
Maël

Reputation: 51994

You can use tidytext:

library(tidytext)
library(tidyr)
library(dplyr)

df %>%
  pivot_longer(everything()) %>% 
  unnest_tokens(value, value, token = "characters") %>% 
  count(value)

output

# A tibble: 10 × 2
   value     n
   <chr> <int>
 1 1         4
 2 2         2
 3 3         2
 4 e         1
 5 g         1
 6 h         1
 7 i         2
 8 n         1
 9 s         2
10 t         3

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

This is almost similar to other two answers (by Karthik and Robert) but

  1. this does not use apply family of functions and
  2. Uses pipe for better readability.

Base R -

df |> 
  as.matrix() |>
  strsplit('') |>
  unlist() |>
  tolower() |>
  table() |>
  stack() |>
  (\(d) setNames(d[2:1], c('character', 'count')))()

#   character count
#1          1     4
#2          2     2
#3          3     2
#4          e     1
#5          g     1
#6          h     1
#7          i     2
#8          n     1
#9          s     2
#10         t     3

And since you tagged tidyverse the same answer is written using tidyverse functions.

library(tidyverse)

df %>%
  as.matrix() %>%
  str_split('') %>%
  flatten_chr() %>%
  tolower() %>%
  table() %>%
  enframe(name = "character", value = "count") %>%
  mutate(count = as.numeric(count))

Upvotes: 2

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

Here's a tidyverse solution:

library(tidyverse)
df %>%
  pivot_longer(everything()) %>%
  separate_rows(value, sep = "(?<!^)(?!$)") %>%
  group_by(char = tolower(value)) %>%
  summarise(N = n())
# A tibble: 10 × 2
   char      N
   <chr> <int>
 1 1         4
 2 2         2
 3 3         2
 4 e         1
 5 g         1
 6 h         1
 7 i         2
 8 n         1
 9 s         2
10 t         3

Upvotes: 0

Karthik S
Karthik S

Reputation: 11584

Does this work:

data.frame(table(strsplit(toupper(paste0(apply(df, 2, paste0, collapse = ''), collapse = '')), split = '')))
   Var1 Freq
1     1    4
2     2    2
3     3    2
4     E    1
5     G    1
6     H    1
7     I    2
8     N    1
9     S    2
10    T    3

Upvotes: 1

Robert Hacken
Robert Hacken

Reputation: 4725

You can use strsplit and table:

df <- data.frame(ColumnA=c('Testing', '1231'),
                 ColumnB=c('This', '1231'))

table(tolower(unlist(sapply(df, strsplit, ''))))
# 1 2 3 e g h i n s t 
# 4 2 2 1 1 1 2 1 2 3 

This does not distinguish between lowercase and uppercase letters – all are changed to lowercase. If you wanted to make that distinction remove the tolower() function.

Upvotes: 3

Related Questions