Sam
Sam

Reputation: 25

Splitting values in a column

sorry I'm new to R but I've got some data that looks like the following:

enter image description here

I'd like count the number of times each object is mentioned in the findings. So the result would look like this:

enter image description here

I've tried tidyverse and separate but can't seem to get the hang of it, any help would be amazing, thanks in advance!

To recreate my data:

df <- data.frame(
  col_1 = paste0("image", 1:5),
  findings = c("rock|cat|sun", "cat", "cat|dog|fish|sun", "sun", "dog|cat")
)

Upvotes: 0

Views: 75

Answers (4)

akrun
akrun

Reputation: 887951

An option with cSplit

library(splitstackshape)
cSplit(df, 'col_2', 'long', sep="|")[, .N, col_2]
#   col_2 N
#1:  rock 1
#2:   cat 4
#3:   sun 3
#4:   dog 2
#5:  fish 1

data

df <- structure(list(col_1 = c("image1", "image2", "image3", "image4", 
"image5"), col_2 = c("rock|cat|sun", "cat", "cat|dog|fish|sun", 
"sun", "dog|cat")), class = "data.frame", row.names = c(NA, -5L
))

Upvotes: 1

s_baldur
s_baldur

Reputation: 33743

In base R:

as.data.frame(table(unlist(strsplit(df$col_2, "|", fixed = TRUE))))

#   Var1 Freq
# 1  cat    4
# 2  dog    2
# 3 fish    1
# 4 rock    1
# 5  sun    3

Reproducible data (please provide it in your next post):

df <- data.frame(
  col_1 = paste0("image", 1:5),
  col_2 = c("rock|cat|sun", "cat", "cat|dog|fish|sun", "sun", "dog|cat")
)

Upvotes: 1

iod
iod

Reputation: 7592

Using tidyverse:

df %>% 
separate_rows(findings) %>% 
group_by(findings) %>% 
summarize(total_count_col=n())

First we convert the data into a long format using separate_rows, then group and count the number of rows with each finding.

Example:

df<-data.frame(col1=c(rep(letters[1:3],3),"d"),col2=c(rep("moose|cat|dog",9),"rock"), stringsAsFactors = FALSE)
df %>% separate_rows(col2) %>% group_by(col2) %>% summarize(total_count_col=n())
# A tibble: 4 x 2
  col2  total_count_col
  <chr>           <int>
1 cat                 9
2 dog                 9
3 moose               9
4 rock                1

Upvotes: 0

Darren Tsai
Darren Tsai

Reputation: 35649

You can use separate_rows() and then count().

library(tidyverse)

df %>%
  separate_rows(findings) %>%
  count(findings)

# # A tibble: 5 x 2
#   findings     n
#   <chr>    <int>
# 1 cat          4
# 2 dog          2
# 3 fish         1
# 4 rock         1
# 5 sun          3

Data

df <- structure(list(col_1 = c("image_1", "image_2", "image_3", "image_4", 
"image_5"), findings = c("rock|cat|sun", "cat", "cat|dog|fish|sun", 
"sun", "dog|cat")), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 1

Related Questions