Format data in R

Question

I have a excel file that contains data in the following format:

Serial          Name          College     Time

Wednesday       24/10/2014
1               StudentA      UA          12:00:00
2               StudentB      UA          13:00:00

Thursday        25/10/2014
3               StudentC      UA          11:00:00
4               StudentA      UA          15:00:00

When converted to CSV, it looks like this:

Wednesday,24/10/2014,,    
1,StudentA,UA,12:00:00
2,StudentB,UA,13:00:00

So, basically, the data is sectioned into a per-day basis. The data for Wednesday, 24/10/2014 is preceded by a row containing Wednesday 24/10/2014 and the same for each day. I want to convert this format to the following:

Serial          Name          College        Date          Time
1               StudentA      UA             24/10/2014    12:00:00
2               StudentB      UA             24/10/2014    13:00:00
3               StudentC      UA             25/10/2014    11:00:00
4               StudentA      UA             25/10/2014    15:00:00

Feel free to ask any questions and use any tools/technologies. I would prefer R, though, since I'm familiar with it.

MrFlick · Accepted Answer

This is a very messy format, but here's one way to possibly deal with it. First, just read in the raw lines, then partition the lines depending on special values

rr <- readLines("input.csv")
rr <- rr[nchar(rr)>0]  #remove empty lines
ghead <- grepl(",,", rr)  # find the "headers" by looking for two empty columns
glines <- rle(cumsum(ghead [-1]))$lengths-1  #see how many rows each group has

#read header and details lines separately
dd <- read.csv(text=rr[!ghead ])
gg <- read.csv(text=rr[ghead ], header=F, 
    col.names=c("Weekday","Date","X","Y"), 
    colClasses=c("character","character","NULL","NULL")) 

#merge together
cbind(dd, gg[rep(1:nrow(gg), glines),])

And this produces

    Serial     Name College     Time   Weekday       Date
1        1 StudentA      UA 12:00:00 Wednesday 24/10/2014
1.1      2 StudentB      UA 13:00:00 Wednesday 24/10/2014
2        3 StudentC      UA 11:00:00  Thursday 25/10/2014
2.1      4 StudentA      UA 15:00:00  Thursday 25/10/2014

Format data in R

Answers (2)

Related Questions