Reputation: 154
I have imported a table that looks like this:
df <- data.frame(study=c("A", "", "", "B", "C", ""),
outcome=c("mortality", "mortality", "surgery", "mortality", "mortality", "surgery"),
time.point=c("30d", "1y", "10d", "1y", "5y", "20d"))
The 2nd and 3rd outcome belong to study A, the 6th outcome belongs to study C. In my table there are various examples like this with irregular number of outcomes and time-points in each study.
How can I assign a good name to each row indicating the study and outcome and time point predicted?
I want it to look like that:
df_new <- data.frame(study=c("A", "", "", "B", "C", ""),
outcome=c("mortality", "mortality", "surgery", "mortality", "mortality", "surgery"),
time.point=c("30d", "1y", "10d", "1y", "5y", "20d"),
rowname=c("A_mortality_30d", "A_mortality_1y", "A_surgery_10d", "B_mortality_1y", "C_mortality_5y", "C_surgery_20d"))
Thank you so much!
Upvotes: 3
Views: 105
Reputation: 303
Base R solution using grep
to get the line numbers of non-empty studies, counting their repeats with diff
, and then repeating them with rep
.
studies <- df[df$study != "", "study"]
reps <- diff(c(grep(".", df$study), nrow(df) +1))
rownames(df) <- paste(rep(studies, reps), df$outcome, df$time.point, sep="_")
> df
study outcome time.point
A_mortality_30d A mortality 30d
A_mortality_1y mortality 1y
A_surgery_10d surgery 10d
B_mortality_1y B mortality 1y
C_mortality_5y C mortality 5y
C_surgery_20d surgery 20d
Upvotes: 2
Reputation: 78937
Credits to Oliver. First part is from him. He was faster.
Then you can use unite
from tidyr
package.
library(tidyr)
library(dplyr)
df1 <- df %>%
mutate(study = case_when(study == "" ~ NA_character_ ,
TRUE ~ study)) %>%
fill(study, .direction = 'down') %>%
unite(rowname, study, outcome, time.point, sep= "_", remove = FALSE)
Upvotes: 3
Reputation: 27732
here is an approach by changing the empty strings to NA
library( data.table ); library( zoo )
#make it a data.table
setDT(df)
#set empty strings as NA
df[ study == "", study := NA_character_ ]
#create new column
df[, rowname := paste( zoo::na.locf( study), outcome, time.point, sep = "_")][]
# study outcome time.point rowname
# 1: A mortality 30d A_mortality_30d
# 2: <NA> mortality 1y A_mortality_1y
# 3: <NA> surgery 10d A_surgery_10d
# 4: B mortality 1y B_mortality_1y
# 5: C mortality 5y C_mortality_5y
# 6: <NA> surgery 20d C_surgery_20d
Upvotes: 3
Reputation: 8572
You could do something like:
library(tidyverse)
df$rowname <- df %>% mutate(study = case_when(study == "" ~ NA_character_ ,
TRUE ~ study)) %>%
fill(study, .direction = 'down') %>%
(function(x)mapply(paste, sep = '_', study = x$study, outcome = x$outcome, time.point = x$time.point))
#alternative use rownames(df) <- ...
df
# study outcome time.point rowname
# 1 A mortality 30d A_mortality_30d
# 2 mortality 1y A_mortality_1y
# 3 surgery 10d A_surgery_10d
# 4 B mortality 1y B_mortality_1y
# 5 C mortality 5y C_mortality_5y
# 6 surgery 20d C_surgery_20d
here I first "replace" non-existing studies with NA_character_
so that I can use fill
to fill in the ""
values. Then I us mapply
to iterate over the values in each column. The mapply
is wrapped in a function, only because I want it within a pipe.
Upvotes: 2