theCed7
theCed7

Reputation: 145

Create a new row based on the difference of previous rows in R

I have extracted a html table using rvest and received a dataframe like this:

data <- tribble(
  ~ward,               ~patient_count, ~date,
  "ICU",                10,            "2021-01-01",
  "of which ventilated", 6,            "2021-01-01",
  "other",               4,            "2021-01-01", 
  "ICU",                 8,            "2021-01-02",
  "of which ventilated", 2,            "2021-01-02",
  "other",               4,            "2021-01-02", 
)

I'm unhappy with this data layout. My first thought was to try to get something like this:

data <- tribble(
  ~ward,               ~patient_count, ~ventilated, ~date,
  "ICU",                 4,            FALSE,       "2021-01-01",
  "ICU",                 6,            TRUE,        "2021-01-01",
  "other",               4,            FALSE,       "2021-01-01",
  "ICU",                 6,            FALSE,       "2021-01-02",
  "ICU",                 2,            TRUE,        "2021-01-02",
  "other",               4,            FALSE,       "2021-01-02",
)

I tried setting the ventilated tag like this:

data <- data %>% 
  mutate(
    ventilated = case_when(ward == "of which ventilated" ~ TRUE,
                           ward != "of which ventilated" ~ FALSE)
  ) 

However, I'm not quite sure what's the best way to get the numbers right.

Upvotes: 1

Views: 188

Answers (2)

Dave2e
Dave2e

Reputation: 24079

Maybe try pivoting the data:

library(tidyr)

data <- tribble(
   ~ward,               ~patient_count, ~date,
   "ICU",                10,            "2021-01-01",
   "of which ventilated", 6,            "2021-01-01",
   "other",               4,            "2021-01-01", 
   "ICU",                 8,            "2021-01-02",
   "of which ventilated", 2,            "2021-01-02",
   "other",               4,            "2021-01-02", 
)

pivot_wider(data, names_from = "ward", values_from = "patient_count")


# A tibble: 2 × 4
  date         ICU `of which ventilated` other
  <chr>      <dbl>                 <dbl> <dbl>
1 2021-01-01    10                     6     4
2 2021-01-02     8                     2     4

Upvotes: 0

Kra.P
Kra.P

Reputation: 15123

Using case_when similarly,(I made it as patient_count_2)

I wonder if there's a more detailed rule to change patient_count. Thanks to @r2evans, to consider robust way, if format of data can be different with the data you provided? Or always like ICU then of which ventilated?

data %>%
  mutate(patient_count_2 = case_when(
    lead(ward) == "of which ventilated" ~ patient_count - lead(patient_count),
    TRUE ~ patient_count
  ),
  ventilated = (ward == "of which ventilated"), 
  ward = ifelse(ward == "of which ventilated", "ICU", ward)
  )


  ward  patient_count date       patient_count_2 ventilated
  <chr>         <dbl> <chr>                <dbl> <lgl>     
1 ICU              10 2021-01-01               4 FALSE     
2 ICU               6 2021-01-01               6 TRUE      
3 other             4 2021-01-01               4 FALSE     
4 ICU               8 2021-01-02               6 FALSE     
5 ICU               2 2021-01-02               2 TRUE      
6 other             4 2021-01-02               4 FALSE   

Upvotes: 1

Related Questions