Reputation: 583
Similar to Merge rows in one data.frame and Merge two rows in one dataframe, when the rows are disjoint and contain nulls I face the following problem for which the aforementioned posts could really help.
The data that I have looks like this
| Date | Checkin | Origin | Checkout | Destination |
| 03-07-17 | 08:00 | A | | |
| 03-07-17 | | A | 09:00 | B |
| 03-07-17 | 17:00 | B | | |
| 03-07-17 | | B | 18:00 | A |
| 04-07-17 | 08:00 | A | | |
| 04-07-17 | | A | 09:00 | B |
| 04-07-17 | 17:00 | B | | |
| 04-07-17 | | B | 18:00 | A |
Now I want to aggregate this into 4 rows looking like:
| Date | Checkin | Origin | Checkout | Destination |
| 03-07-17 | 08:00 | A | 09:00 | B |
| 03-07-17 | 17:00 | B | 18:00 | A |
| 04-07-17 | 08:00 | A | 09:00 | B |
| 04-07-17 | 17:00 | B | 18:00 | A |
Any ideas? Thanks!
Upvotes: 5
Views: 48172
Reputation: 91
More of a round-about way, though it doesn't require the use of dplyr. I am not sure what any of your classes are so based off of your example I pasted the table into excel and saved it as a .csv and just went with what it gave me. Regardless, if you make sure the "empty" indices are actually empty then you can use complete cases.
setwd(Your Working directory)
data = read.csv("exampledata.csv")
data$Date<-as.Date(data$Date,format='%m/%d/%Y')
data$Checkin<-as.character(data$Checkin)
data$Checkin[data$Checkin==""]<-NA
data$Checkout<-as.character(data$Checkout)
data$Checkout[data$Checkout==""]<-NA
checkIns<-data[complete.cases(data$Checkin),]
checkIns$Destination[checkIns$Destination==""]<-NA
checkOuts<-data[complete.cases(data$Checkout),]
data2<-merge(checkIns,checkOuts,by=c("Date","Origin"))
data2 <- data2[,colSums(is.na(data2))<nrow(data2)]
head<-colnames(data)
colnames(data2)<-head
data2
This yielded:
> data2
Date Checkin Origin Checkout Destination
1 3/7/2017 A 8:00 9:00 B
2 3/7/2017 B 17:00 18:00 A
3 4/7/2017 A 8:00 9:00 B
4 4/7/2017 B 17:00 18:00 A
Upvotes: 0
Reputation: 38500
If your data is exactly like the above structure and you have high certainty of this, you can use the following in base R.
cbind(dat[c(TRUE,FALSE), 1:3], dat[c(FALSE, TRUE), 4:5])
Date Checkin Origin Checkout Destination
1 03-07-17 08:00 A 09:00 B
3 03-07-17 17:00 B 18:00 A
5 04-07-17 08:00 A 09:00 B
7 04-07-17 17:00 B 18:00 A
The idea is the take the odd rows (1, 3, 5) for columns 1 through 3 and append to this the even rows (2, 4, 6) for columns 4 and 5.
If any row is out of order or does not have a pair, this won't work.
Upvotes: 1
Reputation: 51582
An idea via dplyr
,
library(dplyr)
df %>%
group_by(Date, Origin) %>%
summarise_all(funs(trimws(paste(., collapse = ''))))
A tibble: 4 x 5 Groups: Date [?] Date Origin Checkin Checkout Destination <chr> <chr> <chr> <chr> <chr> 1 03-07-17 A 08:00 09:00 B 2 03-07-17 B 17:00 18:00 A 3 04-07-17 A 08:00 09:00 B 4 04-07-17 B 17:00 18:00 A
DATA
dput(df)
structure(list(Date = c(" 03-07-17 ", " 03-07-17 ", " 03-07-17 ",
" 03-07-17 ", " 04-07-17 ", " 04-07-17 ", " 04-07-17 ", " 04-07-17 "
), Checkin = c(" 08:00 ", " ", " 17:00 ", " ",
" 08:00 ", " ", " 17:00 ", " "), Origin = c(" A ",
" A ", " B ", " B ", " A ", " A ", " B ",
" B "), Checkout = c(" ", " 09:00 ", " ",
" 18:00 ", " ", " 09:00 ", " ", " 18:00 "
), Destination = c(" ", " B ", " ",
" A ", " ", " B ", " ",
" A ")), .Names = c("Date", "Checkin", "Origin", "Checkout",
"Destination"), row.names = c(NA, -8L), class = "data.frame")
Upvotes: 8