Reputation: 513
I am trying to aggregate rows in dataframe that have some values similar and others different as below :
dataframe1 <- data.frame(Company_Name = c("KFC", "KFC", "KFC", "McD", "McD"),
Company_ID = c(1, 1, 1, 2, 2),
Company_Phone = c("237389", "-", "-", "237002", "-"),
Employee_Name = c("John", "Mary", "Jane", "Joshua",
"Anne"),
Employee_ID = c(1001, 1002, 1003, 2001, 2002))
I wish to combine the rows for the values that are similar and creating new columns for the values that are different as below:
dataframe2 <- data.frame(Company_Name = c("KFC", "McD"),
Company_ID = c(1, 2),
Company_Phone = c("237389", "237002"),
Employee_Name1 = c("John", "Joshua" ),
Employee_ID1 = c(1001, 2001),
Employee_Name2 = c("Mary", "Anne"),
Employee_ID2 = c(1002, 2002),
Employee_Name3 = c("Jane", "na"),
Employee_ID3 = c(1003, "na"))
I have checked similar questions such as this Combining duplicated rows in R and adding new column containing IDs of duplicates and R: collapse rows and then convert row into a new column but I do not wish to sepoarate the values by commas but rather create new columns.
# Company_Name Company_ID Company_Phone Employee_Name1 Employee_ID1 Employee_Name2 Employee_ID2 Employee_Name3 Employee_ID3
#1 KFC 1 237389 John 1001 Mary 1002 Jane 1003
#2 McD 2 237002 Joshua 2001 Anne 2002 na na
Thank you in advance.
Upvotes: 4
Views: 2068
Reputation: 886938
We could do this with dcast
from data.table
which can take multiple value.var
columns. Convert the 'data.frame' to 'data.table' (setDT(dataframe1)
), grouped by 'Company_Name', replace the 'Company_Phone' _
elements with the first
alphanumeric string, then dcast
from 'long' to 'wide' by specifying 'Employee_Name' and 'Employee_ID' as the value.var
columns
library(data.table)
setDT(dataframe1)[, Company_Phone := first(Company_Phone), Company_Name]
res <- dcast(dataframe1, Company_Name + Company_ID + Company_Phone ~
rowid(Company_Name), value.var = c("Employee_Name", "Employee_ID"), sep='')
-output
res
#Company_Name Company_ID Company_Phone Employee_Name1 Employee_Name2 Employee_Name3 Employee_ID1 Employee_ID2 Employee_ID3
#1: KFC 1 237389 John Mary Jane 1001 1002 1003
#2: McD 2 237002 Joshua Anne NA 2001 2002 NA
If we need to order it
res[, c(1:3, order(as.numeric(sub("\\D+", "", names(res)[-(1:3)]))) + 3), with = FALSE]
# Company_Name Company_ID Company_Phone Employee_Name1 Employee_ID1 Employee_Name2 Employee_ID2 Employee_Name3 Employee_ID3
#1: KFC 1 237389 John 1001 Mary 1002 Jane 1003
#2: McD 2 237002 Joshua 2001 Anne 2002 NA NA
Upvotes: 3
Reputation: 6768
Here is an other approach combining dplyr
and cSplit
library(dplyr)
dataframe1 <- dataframe1 %>%
group_by(Company_Name, Company_ID) %>%
summarise_all(funs(paste((.), collapse = ",")))
library(splitstackshape)
dataframe1 <- cSplit(dataframe1, c("Company_Phone", "Employee_Name", "Employee_ID"), ",")
dataframe1
# Company_Name Company_ID Company_Phone_1 Company_Phone_2 Company_Phone_3 Employee_Name_1 Employee_Name_2 Employee_Name_3 Employee_ID_1 Employee_ID_2 Employee_ID_3
#1: KFC 1 237389 - - John Mary Jane 1001 1002 1003
#2: McD 2 237002 - NA Joshua Anne NA 2001 2002 NA
Upvotes: 3
Reputation: 39154
A solution using tidyverse. dat
is the final output.
library(tidyverse)
dat <- dataframe1 %>%
mutate_if(is.factor, as.character) %>%
mutate(Company_Phone = ifelse(Company_Phone %in% "-", NA, Company_Phone)) %>%
fill(Company_Phone) %>%
group_by(Company_ID) %>%
mutate(ID = 1:n()) %>%
gather(Info, Value, starts_with("Employee_")) %>%
unite(New_Col, Info, ID, sep = "") %>%
spread(New_Col, Value) %>%
select(c("Company_Name", "Company_ID", "Company_Phone",
paste0(rep(c("Employee_ID", "Employee_Name"), 3), rep(1:3, each = 2)))) %>%
ungroup()
# View the result
dat %>% as.data.frame(stringsAsFactors = FALSE)
# Company_Name Company_ID Company_Phone Employee_ID1 Employee_Name1 Employee_ID2 Employee_Name2 Employee_ID3 Employee_Name3
# 1 KFC 1 237389 1001 John 1002 Mary 1003 Jane
# 2 McD 2 237002 2001 Joshua 2002 Anne <NA> <NA>
Upvotes: 4