Arvid Gustafsson
Arvid Gustafsson

Reputation: 21

Combine and copy columns on condition

I am using the forestploter package to create a plot. But this question is more related to handling of a data.frame. I wish to create a new variable called ci1, and this works (from the original package) by combining data from 6 columns (est_gp1, low_gp1, high_gp1 and so on).

First wish: However, the added word Ref makes an error in the sprintf( function below. Removing it and the code works. But I wish to have the word "Ref" copied to the variable ci1 for the row male. I have tried with different if else, ifelse and cant get it to work. Also tried by selection rows on condition "Ref".

Second wish: In the row age I have no data for all _gp2's. However, when running the code below the ci1 will be empty for age. I wish it would include the values for _gp1's, basically half of the sprintf( function.

This is how I wish the table would look:

variable est_gp1 low_gp1 hi_gp1 est_gp2 low_gp2 hi_gp2 ci1
sex NA NA NA NA NA NA
female 1.04 1.14 1.1 1.05 1.2 1.08 1.04 (1.14-1.1) 1.05 (1.2-1.08)
male Ref NA NA NA NA NA Ref
age 1.1 1.00 1.3 NA NA NA 1.1 (1.00-1.3)
a_f <- data.frame (
  variable = c("sex", "female", "male", "age"),
  est_gp1 = c(NA,   1.04, "Ref", 1.1),
  low_gp1   = c(NA, 1.14, NA, 1.0),
  hi_gp1    = c(NA, 1.10, NA, 1.3),
  est_gp2   = c(NA, 1.05, NA, NA),
  low_gp2   = c(NA, 1.20, NA, NA),
  hi_gp2 = c(NA, 1.08, NA, NA))


a_f$ci1 <- paste(sprintf("%.1f (%.1f-%.1f)", a_f$est_gp1, a_f$low_gp1, a_f$hi_gp1),
                sprintf("%.1f (%.1f-%.1f)", a_f$est_gp2, a_f$low_gp2, a_f$hi_gp2),
                sep = "\n")

a_f$ci1[grepl("NA", a_f$ci1)] <- "" # Any NA to blank

I have tried different:

if else

and ifelse functions.

And using dplyr package:

a_f %>% 
  mutate(chi1 = ifelse(a_f$est_gp1 == 'Ref', paste(a_f$est_gp1), paste(sprintf("%.1f (%.1f-%.1f)", a_f$est_gp1, a_f$low_gp1, a_f$hi_gp1),
                sprintf("%.1f (%.1f-%.1f)", a_f$est_gp2, a_f$low_gp2, a_f$hi_gp2),
                sep = "\n")$chi1))

And this:

transform(a_f,
          ci1 = ifelse (est_gp1 == "Ref",  "Ref",
          paste(sprintf("%.1f (%.1f-%.1f)", a_f$est_gp1, a_f$low_gp1, a_f$hi_gp1),
                sprintf("%.1f (%.1f-%.1f)", a_f$est_gp2, a_f$low_gp2, a_f$hi_gp2),
                sep = "\n")))

Upvotes: 1

Views: 73

Answers (3)

Arvid Gustafsson
Arvid Gustafsson

Reputation: 21

I ended up using another column which explains if it is a "Ref" row or not.

  a_f <- data.frame (
  variable = c("sex", "female", "male", "age"),
  est_gp1 = c(NA,   1.04, NA, 1.1),
  low_gp1   = c(NA, 1.14, NA, 1.0),
  hi_gp1    = c(NA, 1.10, NA, 1.3),
  est_gp3   = c(NA, 1.05, NA, NA),
  low_gp3   = c(NA, 1.20, NA, NA),
  hi_gp3 = c(NA, 1.08, NA, NA),
  is_ref = c(NA, NA, 1, NA))

And then using this:

a_f$ci1 <- ifelse(is.na(a_f$is_ref), ifelse(is.na(a_f$est_gp3), paste(sprintf( "%.1f (%.1f-%.1f)", a_f$est_gp1, a_f$low_gp1, a_f$hi_gp1)), paste(sprintf( "%.1f (%.1f-%.1f)", a_f$est_gp1, a_f$low_gp1, a_f$hi_gp1), sprintf( "%.1f (%.1f-%.1f)", a_f$est_gp3, a_f$low_gp3, a_f$hi_gp3), sep="\n" )), "Ref")

Upvotes: 0

I_O
I_O

Reputation: 6911

An alternative approach. The code is more universal (accepts an arbitrary number of est-low-hi-triplets) at the cost of brevity:


a_f |> 
  mutate(id = row_number(), .before = 1,
         across(where(is.numeric), ~ sprintf('%.2f', .x)), ## round to 2 digits
         across(-c(id, variable), ~ {
           as.character(.x)
           ifelse(est_gp1 == 'Ref', 'Ref',  .x)
         })
  ) |>
  pivot_longer(-c(id, variable)) |>
  replace_na(list(value = '')) |> 
  separate(name, into = c('stat', 'group'), sep = '_') |>
  pivot_wider(names_from = 'stat', values_from = 'value') |>
  arrange(id, group) |> 
  mutate(ci = case_when(est == 'Ref' ~ 'Ref',
                         est == '' ~ '',
                         .default = sprintf('%s (%s-%s)', est, low, hi)
                        ),
        .by = c(id, group)
  ) |>
  pivot_wider(names_from = group,
              values_from = est:ci,
              names_vary = 'slowest'
  )

result:


    # A tibble: 4 × 10
      id    variable est_gp1 low_gp1 hi_gp1 ci_gp1             est_gp2 low_gp2 hi_gp2 ci_gp2            
      <chr> <chr>    <chr>   <chr>   <chr>  <chr>              <chr>   <chr>   <chr>  <chr>             
    1 1.00  sex      ""      ""      ""     ""                 ""      ""      ""     ""                
    2 2.00  female   "1.04"  "1.14"  "1.10" "1.04 (1.14-1.10)" "1.05"  "1.20"  "1.08" "1.05 (1.20-1.08)"
    3 3.00  male     "Ref"   "Ref"   "Ref"  "Ref"              "Ref"   "Ref"   "Ref"  "Ref"             
    4 4.00  age      "1.1"   "1.00"  "1.30" "1.1 (1.00-1.30)"  "NA"    "NA"    "NA"   "NA (NA-NA)"      
            
     

Upvotes: 0

crf
crf

Reputation: 1880

In R, a vector has to be all the same kind of data, and a data.frame is essentially a list of vectors. The problem you're running into stems from your attempt to mix data types in the est_gp1 column.

est_gp1 = c(NA, 1.04, "Ref", 1.1),

Here you're trying to assign both numeric and character values into the same vector. What R does here is is "coerces" this vector to a character vector, so that all values can be the same type. You can see this if you print a_f$est_gpt again.

> a_f$est_gp1
[1] NA     "1.04" "Ref"  "1.1"

the quotation marks indicate that this is a character vector. Since the numbers have been transformed to characters, your call to sprintf doesn't work.

I think what you really want to do is something like this.

library(dplyr)
a_f <- data.frame (
  variable = c("sex", "female", "male", "age"),
  est_gp1 = c(NA,   1.04, NA, 1.1),
  low_gp1   = c(NA, 1.14, NA, 1.0),
  hi_gp1    = c(NA, 1.10, NA, 1.3),
  est_gp2   = c(NA, 1.05, NA, NA),
  low_gp2   = c(NA, 1.20, NA, NA),
  hi_gp2 = c(NA, 1.08, NA, NA)
)


a_f |>
  mutate(ci1 = case_when(
    is.na(est_gp1) ~ '',
    variable == 'male' ~ 'Ref',
    variable == 'age'  ~ sprintf("%.1f (%.1f-%.1f)", est_gp1, low_gp1, hi_gp1),
    TRUE               ~ paste(
      sprintf("%.1f (%.1f-%.1f)", est_gp1, low_gp1, hi_gp1),
      sprintf("%.1f (%.1f-%.1f)", est_gp2, low_gp2, hi_gp2),
      sep = "\n"
    )
  ))
#>   variable est_gp1 low_gp1 hi_gp1 est_gp2 low_gp2 hi_gp2
#> 1      sex      NA      NA     NA      NA      NA     NA
#> 2   female    1.04    1.14    1.1    1.05     1.2   1.08
#> 3     male      NA      NA     NA      NA      NA     NA
#> 4      age    1.10    1.00    1.3      NA      NA     NA
#>                            ci1
#> 1                             
#> 2 1.0 (1.1-1.1)\n1.1 (1.2-1.1)
#> 3                             
#> 4                1.1 (1.0-1.3)

Created on 2024-09-16 with reprex v2.0.2

Upvotes: 0

Related Questions