Reputation: 1036
I have a data frame like this:
df<-structure(list(Date = structure(c(17605, 18179, 17605, 18508,
17626, 17837, 17963, 17900, 17823, 18008), class = "Date"), Patient = c("Doe, John",
"Doe, John", "Scissorhands, Edward", "Coleman, Ronnie", "Cuomo, Governor",
"Kent, Clark", "Wayne, Bruce", "Lane, Lois", "Gray, Dorian",
"Gray, Dorian"), MRN = c(12345, 12345, 54321,
65432, 765432, 9876, 87654, 111111,
101010, 101010), DOB = structure(c(5254, 5254, -1561,
-10629, 428, 3005, 3156, -2127, -5836, -5836), class = "Date"),
`ICD-10 Billed Procedure Code` = c("Detachment at Right Ring Finger, Mid, Open Approach",
"Detachment at Right Ring Finger, Low, Open Approach", "Detachment at Right Index Finger, Mid, Open Approach",
"Detachment at Left Hand, Complete 5th Ray, Open Approach",
"Detachment at Right Hand, Complete 5th Ray, Open Approach",
"Detachment at Right Little Finger, High, Open Approach",
"Detachment at Right Hand, Partial 1st Ray, Open Approach",
"Detachment at Left Index Finger, High, Open Approach", "Detachment at Right Index Finger, Low, Open Approach",
"Detachment at Left Index Finger, Low, Open Approach"), Admission = structure(c(0,
0, 0, 0, 1, 5, 2, 0, 0, 0), class = "difftime", units = "days")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"), problems = structure(list(
row = 456L, col = "Discharge Date", expected = "date like %m/%d/%Y",
actual = "c", file = "'Patients.csv'"), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame")))
And I'd like to pivot it wider so that there is only one row per patient. I can do that with this code and it works beautifully on the first ten rows of my data (what you see above):
df<-df%>%pivot_wider(names_from = `ICD-10 Billed Procedure Code`,values_from = c(Date,Admission))
It creates a bunch of columns that have info for the date of when the patients had each surgery and how long they were admitted for each one. Thats fine. I'll organize all those new columns later into what I want.
My problem arises when I try to run it on all of the rows I have (497 total) and you can see it happen here when I add an 11th row.
df2<-structure(list(Date = structure(c(17605, 18179, 17605, 18508,
17626, 17837, 17963, 17900, 17823, 18008, 18008), class = "Date"),
Patient = c("Doe, John",
"Doe, John", "Scissorhands, Edward", "Coleman, Ronnie", "Cuomo, Governor",
"Kent, Clark", "Wayne, Bruce", "Lane, Lois", "Gray, Dorian",
"Gray, Dorian",
"Gray, Dorian"), MRN = c(12345, 12345, 54321,
65432, 765432, 9876, 87654, 111111,
101010, 101010, 101010), DOB = structure(c(5254,
5254, -1561, -10629, 428, 3005, 3156, -2127, -5836, -5836,
-5836), class = "Date"), `ICD-10 Billed Procedure Code` = c("Detachment at Right Ring Finger, Mid, Open Approach",
"Detachment at Right Ring Finger, Low, Open Approach", "Detachment at Right Index Finger, Mid, Open Approach",
"Detachment at Left Hand, Complete 5th Ray, Open Approach",
"Detachment at Right Hand, Complete 5th Ray, Open Approach",
"Detachment at Right Little Finger, High, Open Approach",
"Detachment at Right Hand, Partial 1st Ray, Open Approach",
"Detachment at Left Index Finger, High, Open Approach", "Detachment at Right Index Finger, Low, Open Approach",
"Detachment at Left Index Finger, Low, Open Approach", "Detachment at Right Index Finger, Low, Open Approach"
), Admission = structure(c(0, 0, 0, 0, 1, 5, 2, 0, 0, 0,
0), class = "difftime", units = "days")), row.names = c(NA,
-11L), class = c("tbl_df", "tbl", "data.frame"), problems = structure(list(
row = 456L, col = "Discharge Date", expected = "date like %m/%d/%Y",
actual = "c", file = "'Patients.csv'"), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame")))
And all of the cells that should be NA get NULL instead, and the date formats get all wonky.
Given I know roughly what causes it, I figure it must be the fact that two "Dorian Gray" rows shared the same "Procedure code". (I think? Tell me if I'm wrong) The thing is, that'll be the case a few times and I guess I'm ok with it? ULTIMATELY I'll condense my output to look like: "Patient, MRN, DOB, Number_of_Surgeries, Total_Admission_Time" but I figured I tackle this pivot_wider first before condensing the columns into "Number of surgeries" and "total admission time".
So... how can I fix that error?
Upvotes: 0
Views: 132
Reputation: 160782
It seems odd to me that each of your wider columns would be "Detachment at Right Ring Finger, Mid, Open Approach"
and "Detachment at Right Ring Finger, Low, Open Approach"
. To me, it could make sense to split this into several columns, such as "Finger"
, "Where"
, "Approach"
, etc.
Try this:
library(dplyr)
library(tidyr)
out <- df %>%
mutate(strcapture("Detachment at ([^,]+), ?([^,]+), ?(.*)",
`ICD-10 Billed Procedure Code`,
list(Finger="", Where="", Approach=""))) %>%
group_by(Patient) %>%
mutate(rn = row_number()) %>%
ungroup() %>%
select(-`ICD-10 Billed Procedure Code`) %>%
pivot_wider(
c(Patient, MRN, DOB, Admission),
names_from = "rn",
values_from = c("Date", "Finger", "Where", "Approach"),
names_glue = "{.value}_{rn}")
# out
# A tibble: 8 x 12
# Patient MRN DOB Admission Date_1 Date_2 Finger_1 Finger_2 Where_1 Where_2 Approach_1 Approach_2
# <chr> <dbl> <date> <drtn> <date> <date> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 Doe, John 12345 1984-05-21 0 days 2018-03-15 2019-10-10 Right Ring Finger Right Ring Finger Mid Low Open Approach Open Approach
# 2 Scissorhands, Edward 54321 1965-09-23 0 days 2018-03-15 NA Right Index Finger <NA> Mid <NA> Open Approach <NA>
# 3 Coleman, Ronnie 65432 1940-11-25 0 days 2020-09-03 NA Left Hand <NA> Complete 5th Ray <NA> Open Approach <NA>
# 4 Cuomo, Governor 765432 1971-03-05 1 days 2018-04-05 NA Right Hand <NA> Complete 5th Ray <NA> Open Approach <NA>
# 5 Kent, Clark 9876 1978-03-25 5 days 2018-11-02 NA Right Little Finger <NA> High <NA> Open Approach <NA>
# 6 Wayne, Bruce 87654 1978-08-23 2 days 2019-03-08 NA Right Hand <NA> Partial 1st Ray <NA> Open Approach <NA>
# 7 Lane, Lois 111111 1964-03-06 0 days 2019-01-04 NA Left Index Finger <NA> High <NA> Open Approach <NA>
# 8 Gray, Dorian 101010 1954-01-09 0 days 2018-10-19 2019-04-22 Right Index Finger Left Index Finger Low Low Open Approach Open Approach
As an aesthetic addition, the columns can be grouped better:
cn <- colnames(out)
out[, cn[ order(grepl("[0-9]$", cn), gsub("\\D", "", cn)) ]]
# # A tibble: 8 x 12
# Patient MRN DOB Admission Date_1 Finger_1 Where_1 Approach_1 Date_2 Finger_2 Where_2 Approach_2
# <chr> <dbl> <date> <drtn> <date> <chr> <chr> <chr> <date> <chr> <chr> <chr>
# 1 Doe, John 12345 1984-05-21 0 days 2018-03-15 Right Ring Finger Mid Open Approach 2019-10-10 Right Ring Finger Low Open Approach
# 2 Scissorhands, Edward 54321 1965-09-23 0 days 2018-03-15 Right Index Finger Mid Open Approach NA <NA> <NA> <NA>
# 3 Coleman, Ronnie 65432 1940-11-25 0 days 2020-09-03 Left Hand Complete 5th Ray Open Approach NA <NA> <NA> <NA>
# 4 Cuomo, Governor 765432 1971-03-05 1 days 2018-04-05 Right Hand Complete 5th Ray Open Approach NA <NA> <NA> <NA>
# 5 Kent, Clark 9876 1978-03-25 5 days 2018-11-02 Right Little Finger High Open Approach NA <NA> <NA> <NA>
# 6 Wayne, Bruce 87654 1978-08-23 2 days 2019-03-08 Right Hand Partial 1st Ray Open Approach NA <NA> <NA> <NA>
# 7 Lane, Lois 111111 1964-03-06 0 days 2019-01-04 Left Index Finger High Open Approach NA <NA> <NA> <NA>
# 8 Gray, Dorian 101010 1954-01-09 0 days 2018-10-19 Right Index Finger Low Open Approach 2019-04-22 Left Index Finger Low Open Approach
(I'm guessing there is a better dplyr
-esque way to do this, perhaps using relocate
.)
Upvotes: 1
Reputation: 2650
if you eventually just need the summry data I don't see why you need thie step first, you could simply:
library(dplyr)
df2 %>%
group_by(Date, Patient, MRN, DOB) %>%
summarize(Number_of_Surgeries=length(unique(`ICD-10 Billed Procedure Code`)),
Total_Admission_Time = sum(Admission))
# output
#
# Date Patient MRN DOB Number_of_Surgeries Total_Admission_Time
# <date> <chr> <dbl> <date> <int> <drtn>
# 1 2018-03-15 Doe, John 12345 1984-05-21 1 0 days
# 2 2018-03-15 Scissorhands, Edward 54321 1965-09-23 1 0 days
# 3 2018-04-05 Cuomo, Governor 765432 1971-03-05 1 1 days
# 4 2018-10-19 Gray, Dorian 101010 1954-01-09 1 0 days
# 5 2018-11-02 Kent, Clark 9876 1978-03-25 1 5 days
# 6 2019-01-04 Lane, Lois 111111 1964-03-06 1 0 days
# 7 2019-03-08 Wayne, Bruce 87654 1978-08-23 1 2 days
# 8 2019-04-22 Gray, Dorian 101010 1954-01-09 2 0 days
# 9 2019-10-10 Doe, John 12345 1984-05-21 1 0 days
#10 2020-09-03 Coleman, Ronnie 65432 1940-11-25 1 0 days
Upvotes: 1