Reputation: 61

How to create a dataframe from using data from another dataframe?

I'm not sure if this is possible in R, but I have a dataframe original_data with one row and columns as follows:

A  Ar   A1   A1r   B    Br   B1   B1r   C   Cr   C1   C1r......
0  0.1  0.5  0.1  0.1  0.6  0.7  1.2   1.4  1.2  1.5  1.8.....

structure(list(A = 0L, Ar = 0.1, A1 = 0.5, A1r = 0.1, B = 0.1, 
    Br = 0.6, B1 = 0.7, B1r = 1.2, C = 1.4, Cr = 1.2, C1 = 1.5, 
    C1r = 1.8), row.names = c(NA, -1L), class = "data.frame")

To explain what A, Ar, A1, and A1r mean:

A : ID with measurement taken at Visit 1.

Ar: Same ID as A but a replicate from Visit1

A1: Same ID as A, but measurement taken at Visit 2.

A1r: Same ID as A, but a replicate of the measurement A1.

I want to transform it to a dataframe that looks as follows:

ID   Visit   Replicate   Value
A     1         1         0
A     1         2         0.1
A     2         1         0.5
A     2         2         0.1
B     1         1         0.1
B     1         2         0.6
B     2         1         0.7
B     2         2         1.2

I tried to do it in R:

new_data_frame = data.frame(ID=character(0),Visit=integer(0),Replicate=integer(0),Value=integer(0))

for(i in 1:ncol(original_data))

{   #this is for the column "ID"

    new_data_frame$ID[i]=colnames(original_data)[i]

    #this is for the column "Replicate"
    if(grepl("r",colnames(original_data)[i])==True)
     {
         new_data_frame$Replicate[i]=2
     }
    else
    {
         new_data_frame$Replicate[i]=1
    }

    #this is for the column "Visit"
   if(grepl("1",colnames(original_data)[i])==True)
    {
      new_data_frame$Visit[i]=2
    }
   else
   {
    new_data_frame$Visit[i]=1
   }

#this is for the column "Value"
new_data_frame$Value[i]=original_data[,i]

}

I get an error:

Error in `$<-.data.frame`(`*tmp*`, "ID", value = NA_integer_) : 
  replacement has 1 row, data has 0

How can I fix my code to make this work?

Upvotes: 3

Answers (4)

sm925

Reputation: 2678

Here's a solution using stack to convert data into long format and then using data.table:

library(data.table)
df <- stack(df)
setDT(df)[, ID := substr(ind, 1, 1)][, Visit := ifelse(grepl("\\d", ind) == T, as.numeric(gsub("[^0-9.]", "",  ind)) + 1, 1)][, Replicate := ifelse(grepl("r", ind) == T, 2, 1)][, c("ID", "Visit", "Replicate", "values")]

#   ID Visit Replicate values
#1:  A     1         1    0.0
#2:  A     1         2    0.1
#3:  A     2         1    0.5
#4:  A     2         2    0.1
#5:  B     1         1    0.1
#6:  B     1         2    0.6
#7:  B     2         1    0.7
#8:  B     2         2    1.2
#9:  C     1         1    1.4
#10: C     1         2    1.2
#11: C     2         1    1.5
#12: C     2         2    1.8

Upvotes: 1

IceCreamToucan

Reputation: 28705

The ID is the first character, Visit is 1 + (the number in the name or 0 if no number), Replicate is 1 + (1 if the name ends in 'r' else 0), and Value is the value of the unlisted data.frame.

df_vec <- unlist(df)

data.frame(
  ID = substr(names(df_vec), 1, 1),
  Visit = 1 + dplyr::coalesce(readr::parse_number(names(df_vec)), 0),
  Replicate = 1 + grepl('r$', names(df_vec)),
  Value = df_vec)

#     ID Visit Replicate Value
# A    A     1         1   0.0
# Ar   A     1         2   0.1
# A1   A     2         1   0.5
# A1r  A     2         2   0.1
# B    B     1         1   0.1
# Br   B     1         2   0.6
# B1   B     2         1   0.7
# B1r  B     2         2   1.2
# C    C     1         1   1.4
# Cr   C     1         2   1.2
# C1   C     2         1   1.5
# C1r  C     2         2   1.8

Upvotes: 5

Andrew

Reputation: 5138

Here is one solution using tidyverse packages. This basically transforms your dataframe into long format and uses the (old) column names to extract the info that you need. Right now this assumes there can only be one replicate but there can be more than two visits. If there can only be two visits it would be easy to simplify the creation of the Visit variable:

library(tidyr)
library(dplyr)

    df1 %>%
      pivot_longer(everything()) %>%
      transmute(ID = gsub("(\\d+|r)", "", name),
                Visit = ifelse(grepl("\\d", name), 1 + as.integer(gsub("\\D", "", name)), 1),
                Replicate = ifelse(grepl("r", name, fixed = T), 2, 1))

# A tibble: 12 x 3
   ID    Visit Replicate
   <chr> <dbl>     <dbl>
 1 A         1         1
 2 A         1         2
 3 A         2         1
 4 A         2         2
 5 B         1         1
 6 B         1         2
 7 B         2         1
 8 B         2         2
 9 C         1         1
10 C         1         2
11 C         2         1
12 C         2         2

Upvotes: 2

Rajnish kumar

Reputation: 196

I am new to it. But I tried like this and it worked for me. Yes you can do like this:

New_data <- data.frame("variable1" = old$variable1, "variable2" = old$variable2, "variable3" = old$variable3)

Upvotes: 0

How to create a dataframe from using data from another dataframe?

Answers (4)

Related Questions