Reputation: 1
I have a data set that has been giving me some trouble. I did not think it would turn into such a complex problem, but now I feel like it has.
I have a data set that looks somewhat similar to this
CaseInfoColNames CaseInfoData UID RunTime
**WarrantCase** XXXXXXXXXXXX P19 TIME
Desc . P19 TIME
Bond 500 P19 TIME
Remarks No License P19 TIME
**WarrantCase** YYYYYYYYYYYY P19 TIME
Desc . P19 TIME
Bond 200 P19 TIME
**CaseNumber** 123456789134 X20 TIME
Desc BOND X20 TIME
Bond 1000 X20 TIME
CommentCase DIV 2 X20 TIME
Charge Drive w.o.Lic X20 TIME
Basically, there are IDs and for each ID there can be two types either a "Warrant Case" or a "Case Number." The rows that follow beneath these two headings should become the column headers. Each ID could have multiple rows and I am realizing that some column headers will not be the same, so is there a way to fill in NAs too. Idealistically, maybe the data could look something like this, but if this is too much of a hassle maybe it is better to leave in the long format? Many thanks!
UID RunTime WarrantCase Desc Bond Remarks
P19 TIME XXXXXXXXXXXX . 500 No License
P19 TIME YYYYYYYYYYYY . 200 NA
X20 TIME NA BOND 1000 NA
CaseNumber CommentCase Charge
NA NA NA
NA NA NA
123.. DIV 2 Drive w.o. Lic
Upvotes: 0
Views: 538
Reputation: 2056
The trick is to create a unique ID for each case so that all the lines from each case have the same ID:
df1 %>%
mutate(caseID = cumsum(as.numeric(grepl("\\*\\*",df1$CaseInfoColNames)))) %>%
spread(CaseInfoColNames,CaseInfoData)
UID RunTime caseID **CaseNumber** **WarrantCase** Bond Charge CommentCase Desc Remarks
1 P19 TIME 1 <NA> XXXXXXXXXXXX 500 <NA> <NA> . No.License
2 P19 TIME 2 <NA> YYYYYYYYYYYY 200 <NA> <NA> . <NA>
3 X20 TIME 3 123456789134 <NA> 1000 Drive.w.o.Lic DIV.2 BOND <NA>
You can remove the caseID column later if you want
Upvotes: 1