Reputation: 21
I am using the forestploter package to create a plot. But this question is more related to handling of a data.frame. I wish to create a new variable called ci1, and this works (from the original package) by combining data from 6 columns (est_gp1, low_gp1, high_gp1 and so on).
First wish: However, the added word Ref makes an error in the sprintf( function below. Removing it and the code works. But I wish to have the word "Ref" copied to the variable ci1 for the row male. I have tried with different if else, ifelse and cant get it to work. Also tried by selection rows on condition "Ref".
Second wish: In the row age I have no data for all _gp2's. However, when running the code below the ci1 will be empty for age. I wish it would include the values for _gp1's, basically half of the sprintf( function.
This is how I wish the table would look:
variable | est_gp1 | low_gp1 | hi_gp1 | est_gp2 | low_gp2 | hi_gp2 | ci1 |
---|---|---|---|---|---|---|---|
sex | NA | NA | NA | NA | NA | NA | |
female | 1.04 | 1.14 | 1.1 | 1.05 | 1.2 | 1.08 | 1.04 (1.14-1.1) 1.05 (1.2-1.08) |
male | Ref | NA | NA | NA | NA | NA | Ref |
age | 1.1 | 1.00 | 1.3 | NA | NA | NA | 1.1 (1.00-1.3) |
a_f <- data.frame (
variable = c("sex", "female", "male", "age"),
est_gp1 = c(NA, 1.04, "Ref", 1.1),
low_gp1 = c(NA, 1.14, NA, 1.0),
hi_gp1 = c(NA, 1.10, NA, 1.3),
est_gp2 = c(NA, 1.05, NA, NA),
low_gp2 = c(NA, 1.20, NA, NA),
hi_gp2 = c(NA, 1.08, NA, NA))
a_f$ci1 <- paste(sprintf("%.1f (%.1f-%.1f)", a_f$est_gp1, a_f$low_gp1, a_f$hi_gp1),
sprintf("%.1f (%.1f-%.1f)", a_f$est_gp2, a_f$low_gp2, a_f$hi_gp2),
sep = "\n")
a_f$ci1[grepl("NA", a_f$ci1)] <- "" # Any NA to blank
I have tried different:
if else
and ifelse functions.
And using dplyr package:
a_f %>%
mutate(chi1 = ifelse(a_f$est_gp1 == 'Ref', paste(a_f$est_gp1), paste(sprintf("%.1f (%.1f-%.1f)", a_f$est_gp1, a_f$low_gp1, a_f$hi_gp1),
sprintf("%.1f (%.1f-%.1f)", a_f$est_gp2, a_f$low_gp2, a_f$hi_gp2),
sep = "\n")$chi1))
And this:
transform(a_f,
ci1 = ifelse (est_gp1 == "Ref", "Ref",
paste(sprintf("%.1f (%.1f-%.1f)", a_f$est_gp1, a_f$low_gp1, a_f$hi_gp1),
sprintf("%.1f (%.1f-%.1f)", a_f$est_gp2, a_f$low_gp2, a_f$hi_gp2),
sep = "\n")))
Upvotes: 1
Views: 73
Reputation: 21
I ended up using another column which explains if it is a "Ref" row or not.
a_f <- data.frame (
variable = c("sex", "female", "male", "age"),
est_gp1 = c(NA, 1.04, NA, 1.1),
low_gp1 = c(NA, 1.14, NA, 1.0),
hi_gp1 = c(NA, 1.10, NA, 1.3),
est_gp3 = c(NA, 1.05, NA, NA),
low_gp3 = c(NA, 1.20, NA, NA),
hi_gp3 = c(NA, 1.08, NA, NA),
is_ref = c(NA, NA, 1, NA))
And then using this:
a_f$ci1 <- ifelse(is.na(a_f$is_ref), ifelse(is.na(a_f$est_gp3), paste(sprintf( "%.1f (%.1f-%.1f)", a_f$est_gp1, a_f$low_gp1, a_f$hi_gp1)), paste(sprintf( "%.1f (%.1f-%.1f)", a_f$est_gp1, a_f$low_gp1, a_f$hi_gp1), sprintf( "%.1f (%.1f-%.1f)", a_f$est_gp3, a_f$low_gp3, a_f$hi_gp3), sep="\n" )), "Ref")
Upvotes: 0
Reputation: 6911
An alternative approach. The code is more universal (accepts an arbitrary number of est-low-hi-triplets) at the cost of brevity:
a_f |>
mutate(id = row_number(), .before = 1,
across(where(is.numeric), ~ sprintf('%.2f', .x)), ## round to 2 digits
across(-c(id, variable), ~ {
as.character(.x)
ifelse(est_gp1 == 'Ref', 'Ref', .x)
})
) |>
pivot_longer(-c(id, variable)) |>
replace_na(list(value = '')) |>
separate(name, into = c('stat', 'group'), sep = '_') |>
pivot_wider(names_from = 'stat', values_from = 'value') |>
arrange(id, group) |>
mutate(ci = case_when(est == 'Ref' ~ 'Ref',
est == '' ~ '',
.default = sprintf('%s (%s-%s)', est, low, hi)
),
.by = c(id, group)
) |>
pivot_wider(names_from = group,
values_from = est:ci,
names_vary = 'slowest'
)
result:
# A tibble: 4 × 10
id variable est_gp1 low_gp1 hi_gp1 ci_gp1 est_gp2 low_gp2 hi_gp2 ci_gp2
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1.00 sex "" "" "" "" "" "" "" ""
2 2.00 female "1.04" "1.14" "1.10" "1.04 (1.14-1.10)" "1.05" "1.20" "1.08" "1.05 (1.20-1.08)"
3 3.00 male "Ref" "Ref" "Ref" "Ref" "Ref" "Ref" "Ref" "Ref"
4 4.00 age "1.1" "1.00" "1.30" "1.1 (1.00-1.30)" "NA" "NA" "NA" "NA (NA-NA)"
Upvotes: 0
Reputation: 1880
In R, a vector has to be all the same kind of data, and a data.frame is essentially a list of vectors. The problem you're running into stems from your attempt to mix data types in the est_gp1
column.
est_gp1 = c(NA, 1.04, "Ref", 1.1),
Here you're trying to assign both numeric and character values into the same vector. What R does here is is "coerces" this vector to a character vector, so that all values can be the same type. You can see this if you print a_f$est_gpt
again.
> a_f$est_gp1
[1] NA "1.04" "Ref" "1.1"
the quotation marks indicate that this is a character vector. Since the numbers have been transformed to characters, your call to sprintf doesn't work.
I think what you really want to do is something like this.
library(dplyr)
a_f <- data.frame (
variable = c("sex", "female", "male", "age"),
est_gp1 = c(NA, 1.04, NA, 1.1),
low_gp1 = c(NA, 1.14, NA, 1.0),
hi_gp1 = c(NA, 1.10, NA, 1.3),
est_gp2 = c(NA, 1.05, NA, NA),
low_gp2 = c(NA, 1.20, NA, NA),
hi_gp2 = c(NA, 1.08, NA, NA)
)
a_f |>
mutate(ci1 = case_when(
is.na(est_gp1) ~ '',
variable == 'male' ~ 'Ref',
variable == 'age' ~ sprintf("%.1f (%.1f-%.1f)", est_gp1, low_gp1, hi_gp1),
TRUE ~ paste(
sprintf("%.1f (%.1f-%.1f)", est_gp1, low_gp1, hi_gp1),
sprintf("%.1f (%.1f-%.1f)", est_gp2, low_gp2, hi_gp2),
sep = "\n"
)
))
#> variable est_gp1 low_gp1 hi_gp1 est_gp2 low_gp2 hi_gp2
#> 1 sex NA NA NA NA NA NA
#> 2 female 1.04 1.14 1.1 1.05 1.2 1.08
#> 3 male NA NA NA NA NA NA
#> 4 age 1.10 1.00 1.3 NA NA NA
#> ci1
#> 1
#> 2 1.0 (1.1-1.1)\n1.1 (1.2-1.1)
#> 3
#> 4 1.1 (1.0-1.3)
Created on 2024-09-16 with reprex v2.0.2
Upvotes: 0