Joe Crozier
Joe Crozier

Reputation: 1036

R, pivot_wider issue

I have a data frame like this:

    df<-structure(list(Date = structure(c(17605, 18179, 17605, 18508, 
17626, 17837, 17963, 17900, 17823, 18008), class = "Date"), Patient = c("Doe, John", 
"Doe, John", "Scissorhands, Edward", "Coleman, Ronnie", "Cuomo, Governor", 
"Kent, Clark", "Wayne, Bruce", "Lane, Lois", "Gray, Dorian", 
"Gray, Dorian"), MRN = c(12345, 12345, 54321, 
65432, 765432, 9876, 87654, 111111, 
101010, 101010), DOB = structure(c(5254, 5254, -1561, 
-10629, 428, 3005, 3156, -2127, -5836, -5836), class = "Date"), 
    `ICD-10 Billed Procedure Code` = c("Detachment at Right Ring Finger, Mid, Open Approach", 
    "Detachment at Right Ring Finger, Low, Open Approach", "Detachment at Right Index Finger, Mid, Open Approach", 
    "Detachment at Left Hand, Complete 5th Ray, Open Approach", 
    "Detachment at Right Hand, Complete 5th Ray, Open Approach", 
    "Detachment at Right Little Finger, High, Open Approach", 
    "Detachment at Right Hand, Partial 1st Ray, Open Approach", 
    "Detachment at Left Index Finger, High, Open Approach", "Detachment at Right Index Finger, Low, Open Approach", 
    "Detachment at Left Index Finger, Low, Open Approach"), Admission = structure(c(0, 
    0, 0, 0, 1, 5, 2, 0, 0, 0), class = "difftime", units = "days")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"), problems = structure(list(
    row = 456L, col = "Discharge Date", expected = "date like %m/%d/%Y", 
    actual = "c", file = "'Patients.csv'"), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")))

And I'd like to pivot it wider so that there is only one row per patient. I can do that with this code and it works beautifully on the first ten rows of my data (what you see above):

df<-df%>%pivot_wider(names_from = `ICD-10 Billed Procedure Code`,values_from = c(Date,Admission))

It creates a bunch of columns that have info for the date of when the patients had each surgery and how long they were admitted for each one. Thats fine. I'll organize all those new columns later into what I want.

My problem arises when I try to run it on all of the rows I have (497 total) and you can see it happen here when I add an 11th row.

    df2<-structure(list(Date = structure(c(17605, 18179, 17605, 18508, 
    17626, 17837, 17963, 17900, 17823, 18008, 18008), class = "Date"), 
        Patient = c("Doe, John", 
"Doe, John", "Scissorhands, Edward", "Coleman, Ronnie", "Cuomo, Governor", 
"Kent, Clark", "Wayne, Bruce", "Lane, Lois", "Gray, Dorian", 
"Gray, Dorian", 
        "Gray, Dorian"), MRN = c(12345, 12345, 54321, 
65432, 765432, 9876, 87654, 111111, 
101010, 101010, 101010), DOB = structure(c(5254, 
        5254, -1561, -10629, 428, 3005, 3156, -2127, -5836, -5836, 
        -5836), class = "Date"), `ICD-10 Billed Procedure Code` = c("Detachment at Right Ring Finger, Mid, Open Approach", 
        "Detachment at Right Ring Finger, Low, Open Approach", "Detachment at Right Index Finger, Mid, Open Approach", 
        "Detachment at Left Hand, Complete 5th Ray, Open Approach", 
        "Detachment at Right Hand, Complete 5th Ray, Open Approach", 
        "Detachment at Right Little Finger, High, Open Approach", 
        "Detachment at Right Hand, Partial 1st Ray, Open Approach", 
        "Detachment at Left Index Finger, High, Open Approach", "Detachment at Right Index Finger, Low, Open Approach", 
        "Detachment at Left Index Finger, Low, Open Approach", "Detachment at Right Index Finger, Low, Open Approach"
        ), Admission = structure(c(0, 0, 0, 0, 1, 5, 2, 0, 0, 0, 
        0), class = "difftime", units = "days")), row.names = c(NA, 
    -11L), class = c("tbl_df", "tbl", "data.frame"), problems = structure(list(
        row = 456L, col = "Discharge Date", expected = "date like %m/%d/%Y", 
        actual = "c", file = "'Patients.csv'"), row.names = c(NA, 
    -1L), class = c("tbl_df", "tbl", "data.frame")))

I get this error: enter image description here

And all of the cells that should be NA get NULL instead, and the date formats get all wonky.

Given I know roughly what causes it, I figure it must be the fact that two "Dorian Gray" rows shared the same "Procedure code". (I think? Tell me if I'm wrong) The thing is, that'll be the case a few times and I guess I'm ok with it? ULTIMATELY I'll condense my output to look like: "Patient, MRN, DOB, Number_of_Surgeries, Total_Admission_Time" but I figured I tackle this pivot_wider first before condensing the columns into "Number of surgeries" and "total admission time".

So... how can I fix that error?

Upvotes: 0

Views: 132

Answers (2)

r2evans
r2evans

Reputation: 160782

It seems odd to me that each of your wider columns would be "Detachment at Right Ring Finger, Mid, Open Approach" and "Detachment at Right Ring Finger, Low, Open Approach". To me, it could make sense to split this into several columns, such as "Finger", "Where", "Approach", etc.

Try this:

library(dplyr)
library(tidyr)
out <- df %>%
  mutate(strcapture("Detachment at ([^,]+), ?([^,]+), ?(.*)",
                    `ICD-10 Billed Procedure Code`,
                    list(Finger="", Where="", Approach=""))) %>%
  group_by(Patient) %>%
  mutate(rn = row_number()) %>%
  ungroup() %>%
  select(-`ICD-10 Billed Procedure Code`) %>%
  pivot_wider(
    c(Patient, MRN, DOB, Admission),
    names_from = "rn",
    values_from = c("Date", "Finger", "Where", "Approach"),
    names_glue = "{.value}_{rn}")
# out
# A tibble: 8 x 12
#   Patient                 MRN DOB        Admission Date_1     Date_2     Finger_1            Finger_2          Where_1          Where_2 Approach_1    Approach_2   
#   <chr>                 <dbl> <date>     <drtn>    <date>     <date>     <chr>               <chr>             <chr>            <chr>   <chr>         <chr>        
# 1 Doe, John             12345 1984-05-21 0 days    2018-03-15 2019-10-10 Right Ring Finger   Right Ring Finger Mid              Low     Open Approach Open Approach
# 2 Scissorhands, Edward  54321 1965-09-23 0 days    2018-03-15 NA         Right Index Finger  <NA>              Mid              <NA>    Open Approach <NA>         
# 3 Coleman, Ronnie       65432 1940-11-25 0 days    2020-09-03 NA         Left Hand           <NA>              Complete 5th Ray <NA>    Open Approach <NA>         
# 4 Cuomo, Governor      765432 1971-03-05 1 days    2018-04-05 NA         Right Hand          <NA>              Complete 5th Ray <NA>    Open Approach <NA>         
# 5 Kent, Clark            9876 1978-03-25 5 days    2018-11-02 NA         Right Little Finger <NA>              High             <NA>    Open Approach <NA>         
# 6 Wayne, Bruce          87654 1978-08-23 2 days    2019-03-08 NA         Right Hand          <NA>              Partial 1st Ray  <NA>    Open Approach <NA>         
# 7 Lane, Lois           111111 1964-03-06 0 days    2019-01-04 NA         Left Index Finger   <NA>              High             <NA>    Open Approach <NA>         
# 8 Gray, Dorian         101010 1954-01-09 0 days    2018-10-19 2019-04-22 Right Index Finger  Left Index Finger Low              Low     Open Approach Open Approach

As an aesthetic addition, the columns can be grouped better:

cn <- colnames(out)
out[, cn[ order(grepl("[0-9]$", cn), gsub("\\D", "", cn)) ]]
# # A tibble: 8 x 12
#   Patient                 MRN DOB        Admission Date_1     Finger_1            Where_1          Approach_1    Date_2     Finger_2          Where_2 Approach_2   
#   <chr>                 <dbl> <date>     <drtn>    <date>     <chr>               <chr>            <chr>         <date>     <chr>             <chr>   <chr>        
# 1 Doe, John             12345 1984-05-21 0 days    2018-03-15 Right Ring Finger   Mid              Open Approach 2019-10-10 Right Ring Finger Low     Open Approach
# 2 Scissorhands, Edward  54321 1965-09-23 0 days    2018-03-15 Right Index Finger  Mid              Open Approach NA         <NA>              <NA>    <NA>         
# 3 Coleman, Ronnie       65432 1940-11-25 0 days    2020-09-03 Left Hand           Complete 5th Ray Open Approach NA         <NA>              <NA>    <NA>         
# 4 Cuomo, Governor      765432 1971-03-05 1 days    2018-04-05 Right Hand          Complete 5th Ray Open Approach NA         <NA>              <NA>    <NA>         
# 5 Kent, Clark            9876 1978-03-25 5 days    2018-11-02 Right Little Finger High             Open Approach NA         <NA>              <NA>    <NA>         
# 6 Wayne, Bruce          87654 1978-08-23 2 days    2019-03-08 Right Hand          Partial 1st Ray  Open Approach NA         <NA>              <NA>    <NA>         
# 7 Lane, Lois           111111 1964-03-06 0 days    2019-01-04 Left Index Finger   High             Open Approach NA         <NA>              <NA>    <NA>         
# 8 Gray, Dorian         101010 1954-01-09 0 days    2018-10-19 Right Index Finger  Low              Open Approach 2019-04-22 Left Index Finger Low     Open Approach

(I'm guessing there is a better dplyr-esque way to do this, perhaps using relocate.)

Upvotes: 1

DS_UNI
DS_UNI

Reputation: 2650

if you eventually just need the summry data I don't see why you need thie step first, you could simply:

library(dplyr)
df2 %>%
 group_by(Date, Patient, MRN, DOB) %>%
summarize(Number_of_Surgeries=length(unique(`ICD-10 Billed Procedure Code`)),
          Total_Admission_Time = sum(Admission))
# output
# 
# Date       Patient                 MRN DOB        Number_of_Surgeries Total_Admission_Time
# <date>     <chr>                 <dbl> <date>                   <int> <drtn>              
# 1 2018-03-15 Doe, John             12345 1984-05-21                   1 0 days              
# 2 2018-03-15 Scissorhands, Edward  54321 1965-09-23                   1 0 days              
# 3 2018-04-05 Cuomo, Governor      765432 1971-03-05                   1 1 days              
# 4 2018-10-19 Gray, Dorian         101010 1954-01-09                   1 0 days              
# 5 2018-11-02 Kent, Clark            9876 1978-03-25                   1 5 days              
# 6 2019-01-04 Lane, Lois           111111 1964-03-06                   1 0 days              
# 7 2019-03-08 Wayne, Bruce          87654 1978-08-23                   1 2 days              
# 8 2019-04-22 Gray, Dorian         101010 1954-01-09                   2 0 days              
# 9 2019-10-10 Doe, John             12345 1984-05-21                   1 0 days              
#10 2020-09-03 Coleman, Ronnie       65432 1940-11-25                   1 0 days 

Upvotes: 1

Related Questions