J.Doe
J.Doe

Reputation: 809

R Duplicate Specific Rows In A Data Frame

I have a decent-size data frame of tasks done by different people (and some other information about the task in other columns).

If I get a frequency count of who has done how many tasks I get something like this made-up example data:

Name       Count
John       27
Jack       14
Jill       31
John,Jack  7
Jack and Jill  11
John/Jill  3
Jack+John,Jill 1

My goal is to duplicate jobs done by multiple people. If I run a frequency count I want something like this:

Name    Count
John    35    
Jack    33
Jill    46

I need to duplicate any rows of the data frame where multiple people worked on a job so that the same job is listed as being done solely by each person that worked on it.

I have a list of all the names, but not the various connectors put between them (I've got Jack+Jill,Jack/Jill, Jack and Jill, and other connections between names).

I'm fairly new to R, and I wrote this as:

unlisted = unlist(data$"Name")
temp1 = data[grepl(employeenames[1], unlisted, fixed = TRUE), ]
temp1[, "Name"] = employeenames[1]
for(i in 2:length(employeenames)){
  temp2 = data[grepl(employeenames[i], unlisted, fixed = TRUE), ]
  temp2[ ,"Name"] = employeenames[i]
  temp1 = rbind(temp1, temp2)
}
data = temp1

This works, as far as I've seen, but I've repeatedly been told (or rather, read stackoverflow answers where people have been told) that rbind and for loops do not mix. It seems like too many lines for what should be a simple operation.

Question

What is a faster or more "correct" way to do this?

Upvotes: 0

Views: 560

Answers (1)

alistaire
alistaire

Reputation: 43334

Here's a tidyverse version:

library(tidyverse)

df <- data_frame(Name = c("John", "Jack", "Jill", "John,Jack", "Jack and Jill", "John/Jill", "Jack+John,Jill"), 
                 Count = c(27L, 14L, 31L, 7L, 11L, 3L, 1L))

df %>% 
    separate_rows(Name, sep = '[,/+]| and ') %>% 
    group_by(Name) %>% 
    summarise(Count = sum(Count))
#> # A tibble: 3 x 2
#>   Name  Count
#>   <chr> <int>
#> 1 Jack     33
#> 2 Jill     46
#> 3 John     38

Upvotes: 1

Related Questions