Pascal
Pascal

Reputation: 583

Merge two rows in data.frame

Similar to Merge rows in one data.frame and Merge two rows in one dataframe, when the rows are disjoint and contain nulls I face the following problem for which the aforementioned posts could really help.

The data that I have looks like this

| Date     | Checkin | Origin | Checkout | Destination |
| 03-07-17 | 08:00   | A      |          |             |
| 03-07-17 |         | A      | 09:00    | B           |
| 03-07-17 | 17:00   | B      |          |             |
| 03-07-17 |         | B      | 18:00    | A           |
| 04-07-17 | 08:00   | A      |          |             |
| 04-07-17 |         | A      | 09:00    | B           |
| 04-07-17 | 17:00   | B      |          |             |
| 04-07-17 |         | B      | 18:00    | A           |

Now I want to aggregate this into 4 rows looking like:

| Date     | Checkin | Origin | Checkout | Destination |
| 03-07-17 | 08:00   | A      | 09:00    | B           |
| 03-07-17 | 17:00   | B      | 18:00    | A           |
| 04-07-17 | 08:00   | A      | 09:00    | B           |
| 04-07-17 | 17:00   | B      | 18:00    | A           |

Any ideas? Thanks!

Upvotes: 5

Views: 48172

Answers (3)

More of a round-about way, though it doesn't require the use of dplyr. I am not sure what any of your classes are so based off of your example I pasted the table into excel and saved it as a .csv and just went with what it gave me. Regardless, if you make sure the "empty" indices are actually empty then you can use complete cases.

setwd(Your Working directory)
data = read.csv("exampledata.csv")

data$Date<-as.Date(data$Date,format='%m/%d/%Y')
data$Checkin<-as.character(data$Checkin)
data$Checkin[data$Checkin==""]<-NA

data$Checkout<-as.character(data$Checkout)
data$Checkout[data$Checkout==""]<-NA

checkIns<-data[complete.cases(data$Checkin),]
checkIns$Destination[checkIns$Destination==""]<-NA

checkOuts<-data[complete.cases(data$Checkout),]

data2<-merge(checkIns,checkOuts,by=c("Date","Origin"))
data2 <- data2[,colSums(is.na(data2))<nrow(data2)]
head<-colnames(data)
colnames(data2)<-head

data2

This yielded:

> data2
      Date Checkin Origin Checkout Destination
1 3/7/2017       A   8:00     9:00           B
2 3/7/2017       B  17:00    18:00           A
3 4/7/2017       A   8:00     9:00           B
4 4/7/2017       B  17:00    18:00           A

Upvotes: 0

lmo
lmo

Reputation: 38500

If your data is exactly like the above structure and you have high certainty of this, you can use the following in base R.

cbind(dat[c(TRUE,FALSE), 1:3], dat[c(FALSE, TRUE), 4:5])
        Date   Checkin   Origin   Checkout   Destination
1  03-07-17   08:00     A        09:00      B           
3  03-07-17   17:00     B        18:00      A           
5  04-07-17   08:00     A        09:00      B           
7  04-07-17   17:00     B        18:00      A 

The idea is the take the odd rows (1, 3, 5) for columns 1 through 3 and append to this the even rows (2, 4, 6) for columns 4 and 5.

If any row is out of order or does not have a pair, this won't work.

Upvotes: 1

Sotos
Sotos

Reputation: 51582

An idea via dplyr,

library(dplyr)

df %>% 
 group_by(Date, Origin) %>% 
 summarise_all(funs(trimws(paste(., collapse = ''))))
 A tibble: 4 x 5
 Groups:   Date [?]
        Date   Origin Checkin Checkout Destination
       <chr>    <chr>   <chr>    <chr>       <chr>
1  03-07-17   A         08:00    09:00           B
2  03-07-17   B         17:00    18:00           A
3  04-07-17   A         08:00    09:00           B
4  04-07-17   B         17:00    18:00           A

DATA

dput(df)
structure(list(Date = c(" 03-07-17 ", " 03-07-17 ", " 03-07-17 ", 
" 03-07-17 ", " 04-07-17 ", " 04-07-17 ", " 04-07-17 ", " 04-07-17 "
), Checkin = c(" 08:00   ", "         ", " 17:00   ", "         ", 
" 08:00   ", "         ", " 17:00   ", "         "), Origin = c(" A      ", 
" A      ", " B      ", " B      ", " A      ", " A      ", " B      ", 
" B      "), Checkout = c("          ", " 09:00    ", "          ", 
" 18:00    ", "          ", " 09:00    ", "          ", " 18:00    "
), Destination = c("             ", " B           ", "             ", 
" A           ", "             ", " B           ", "             ", 
" A           ")), .Names = c("Date", "Checkin", "Origin", "Checkout", 
"Destination"), row.names = c(NA, -8L), class = "data.frame")

Upvotes: 8

Related Questions