Phil.He
Phil.He

Reputation: 154

How to complete rownames in R?

I have imported a table that looks like this:

df <- data.frame(study=c("A", "", "", "B", "C", ""), 
                 outcome=c("mortality", "mortality", "surgery", "mortality", "mortality", "surgery"), 
                 time.point=c("30d", "1y", "10d", "1y", "5y", "20d"))

The 2nd and 3rd outcome belong to study A, the 6th outcome belongs to study C. In my table there are various examples like this with irregular number of outcomes and time-points in each study.

How can I assign a good name to each row indicating the study and outcome and time point predicted?

I want it to look like that:

df_new <- data.frame(study=c("A", "", "", "B", "C", ""), 
                     outcome=c("mortality", "mortality", "surgery", "mortality", "mortality", "surgery"), 
                     time.point=c("30d", "1y", "10d", "1y", "5y", "20d"), 
                     rowname=c("A_mortality_30d", "A_mortality_1y", "A_surgery_10d", "B_mortality_1y", "C_mortality_5y", "C_surgery_20d"))

Thank you so much!

Upvotes: 3

Views: 105

Answers (4)

Anonymous
Anonymous

Reputation: 303

Base R solution using grep to get the line numbers of non-empty studies, counting their repeats with diff, and then repeating them with rep.

studies <- df[df$study != "", "study"]
reps <- diff(c(grep(".", df$study), nrow(df) +1))

rownames(df) <- paste(rep(studies, reps), df$outcome, df$time.point, sep="_")

> df
                study   outcome time.point
A_mortality_30d     A mortality        30d
A_mortality_1y        mortality         1y
A_surgery_10d           surgery        10d
B_mortality_1y      B mortality         1y
C_mortality_5y      C mortality         5y
C_surgery_20d           surgery        20d

Upvotes: 2

TarJae
TarJae

Reputation: 78937

Credits to Oliver. First part is from him. He was faster. Then you can use unite from tidyr package.

library(tidyr)
library(dplyr)
df1 <- df %>% 
  mutate(study = case_when(study == "" ~ NA_character_ ,
                           TRUE ~ study)) %>% 
  fill(study, .direction = 'down') %>%
  unite(rowname, study, outcome, time.point, sep= "_", remove = FALSE)

Upvotes: 3

Wimpel
Wimpel

Reputation: 27732

here is an approach by changing the empty strings to NA

library( data.table ); library( zoo )
#make it a data.table
setDT(df)
#set empty strings as NA
df[ study == "", study := NA_character_ ]
#create new column
df[, rowname := paste( zoo::na.locf( study), outcome, time.point, sep = "_")][]
#    study   outcome time.point         rowname
# 1:     A mortality        30d A_mortality_30d
# 2:  <NA> mortality         1y  A_mortality_1y
# 3:  <NA>   surgery        10d   A_surgery_10d
# 4:     B mortality         1y  B_mortality_1y
# 5:     C mortality         5y  C_mortality_5y
# 6:  <NA>   surgery        20d   C_surgery_20d

Upvotes: 3

Oliver
Oliver

Reputation: 8572

You could do something like:

library(tidyverse)
df$rowname <- df %>% mutate(study = case_when(study == "" ~ NA_character_ ,
                                TRUE ~ study)) %>% 
  fill(study, .direction = 'down') %>%
  (function(x)mapply(paste, sep = '_', study = x$study, outcome = x$outcome, time.point = x$time.point))

#alternative use rownames(df) <- ...

df
#   study   outcome time.point         rowname
# 1     A mortality        30d A_mortality_30d
# 2       mortality         1y  A_mortality_1y
# 3         surgery        10d   A_surgery_10d
# 4     B mortality         1y  B_mortality_1y
# 5     C mortality         5y  C_mortality_5y
# 6         surgery        20d   C_surgery_20d

here I first "replace" non-existing studies with NA_character_ so that I can use fill to fill in the "" values. Then I us mapply to iterate over the values in each column. The mapply is wrapped in a function, only because I want it within a pipe.

Upvotes: 2

Related Questions