Reputation: 809
I have a decent-size data frame of tasks done by different people (and some other information about the task in other columns).
If I get a frequency count of who has done how many tasks I get something like this made-up example data:
Name Count
John 27
Jack 14
Jill 31
John,Jack 7
Jack and Jill 11
John/Jill 3
Jack+John,Jill 1
My goal is to duplicate jobs done by multiple people. If I run a frequency count I want something like this:
Name Count
John 35
Jack 33
Jill 46
I need to duplicate any rows of the data frame where multiple people worked on a job so that the same job is listed as being done solely by each person that worked on it.
I have a list of all the names, but not the various connectors put between them (I've got Jack+Jill,Jack/Jill, Jack and Jill, and other connections between names).
I'm fairly new to R, and I wrote this as:
unlisted = unlist(data$"Name")
temp1 = data[grepl(employeenames[1], unlisted, fixed = TRUE), ]
temp1[, "Name"] = employeenames[1]
for(i in 2:length(employeenames)){
temp2 = data[grepl(employeenames[i], unlisted, fixed = TRUE), ]
temp2[ ,"Name"] = employeenames[i]
temp1 = rbind(temp1, temp2)
}
data = temp1
This works, as far as I've seen, but I've repeatedly been told (or rather, read stackoverflow answers where people have been told) that rbind
and for
loops do not mix. It seems like too many lines for what should be a simple operation.
Question
What is a faster or more "correct" way to do this?
Upvotes: 0
Views: 560
Reputation: 43334
Here's a tidyverse version:
library(tidyverse)
df <- data_frame(Name = c("John", "Jack", "Jill", "John,Jack", "Jack and Jill", "John/Jill", "Jack+John,Jill"),
Count = c(27L, 14L, 31L, 7L, 11L, 3L, 1L))
df %>%
separate_rows(Name, sep = '[,/+]| and ') %>%
group_by(Name) %>%
summarise(Count = sum(Count))
#> # A tibble: 3 x 2
#> Name Count
#> <chr> <int>
#> 1 Jack 33
#> 2 Jill 46
#> 3 John 38
Upvotes: 1