Reputation: 61
I'm not sure if this is possible in R, but I have a dataframe original_data
with one row and columns as follows:
A Ar A1 A1r B Br B1 B1r C Cr C1 C1r...... 0 0.1 0.5 0.1 0.1 0.6 0.7 1.2 1.4 1.2 1.5 1.8.....
structure(list(A = 0L, Ar = 0.1, A1 = 0.5, A1r = 0.1, B = 0.1,
Br = 0.6, B1 = 0.7, B1r = 1.2, C = 1.4, Cr = 1.2, C1 = 1.5,
C1r = 1.8), row.names = c(NA, -1L), class = "data.frame")
To explain what A, Ar, A1, and A1r
mean:
A
: ID with measurement taken at Visit 1.
Ar
: Same ID as A
but a replicate from Visit1
A1
: Same ID as A
, but measurement taken at Visit 2.
A1r
: Same ID as A
, but a replicate of the measurement A1
.
I want to transform it to a dataframe that looks as follows:
ID Visit Replicate Value
A 1 1 0
A 1 2 0.1
A 2 1 0.5
A 2 2 0.1
B 1 1 0.1
B 1 2 0.6
B 2 1 0.7
B 2 2 1.2
I tried to do it in R:
new_data_frame = data.frame(ID=character(0),Visit=integer(0),Replicate=integer(0),Value=integer(0))
for(i in 1:ncol(original_data))
{ #this is for the column "ID"
new_data_frame$ID[i]=colnames(original_data)[i]
#this is for the column "Replicate"
if(grepl("r",colnames(original_data)[i])==True)
{
new_data_frame$Replicate[i]=2
}
else
{
new_data_frame$Replicate[i]=1
}
#this is for the column "Visit"
if(grepl("1",colnames(original_data)[i])==True)
{
new_data_frame$Visit[i]=2
}
else
{
new_data_frame$Visit[i]=1
}
#this is for the column "Value"
new_data_frame$Value[i]=original_data[,i]
}
I get an error:
Error in `$<-.data.frame`(`*tmp*`, "ID", value = NA_integer_) :
replacement has 1 row, data has 0
How can I fix my code to make this work?
Upvotes: 3
Views: 1386
Reputation: 2678
Here's a solution using stack
to convert data into long format and then using data.table
:
library(data.table)
df <- stack(df)
setDT(df)[, ID := substr(ind, 1, 1)][, Visit := ifelse(grepl("\\d", ind) == T, as.numeric(gsub("[^0-9.]", "", ind)) + 1, 1)][, Replicate := ifelse(grepl("r", ind) == T, 2, 1)][, c("ID", "Visit", "Replicate", "values")]
# ID Visit Replicate values
#1: A 1 1 0.0
#2: A 1 2 0.1
#3: A 2 1 0.5
#4: A 2 2 0.1
#5: B 1 1 0.1
#6: B 1 2 0.6
#7: B 2 1 0.7
#8: B 2 2 1.2
#9: C 1 1 1.4
#10: C 1 2 1.2
#11: C 2 1 1.5
#12: C 2 2 1.8
Upvotes: 1
Reputation: 28705
The ID is the first character, Visit is 1 + (the number in the name or 0 if no number), Replicate is 1 + (1 if the name ends in 'r' else 0), and Value is the value of the unlisted data.frame.
df_vec <- unlist(df)
data.frame(
ID = substr(names(df_vec), 1, 1),
Visit = 1 + dplyr::coalesce(readr::parse_number(names(df_vec)), 0),
Replicate = 1 + grepl('r$', names(df_vec)),
Value = df_vec)
# ID Visit Replicate Value
# A A 1 1 0.0
# Ar A 1 2 0.1
# A1 A 2 1 0.5
# A1r A 2 2 0.1
# B B 1 1 0.1
# Br B 1 2 0.6
# B1 B 2 1 0.7
# B1r B 2 2 1.2
# C C 1 1 1.4
# Cr C 1 2 1.2
# C1 C 2 1 1.5
# C1r C 2 2 1.8
Upvotes: 5
Reputation: 5138
Here is one solution using tidyverse packages. This basically transforms your dataframe into long format and uses the (old) column names to extract the info that you need. Right now this assumes there can only be one replicate but there can be more than two visits. If there can only be two visits it would be easy to simplify the creation of the Visit
variable:
library(tidyr)
library(dplyr)
df1 %>%
pivot_longer(everything()) %>%
transmute(ID = gsub("(\\d+|r)", "", name),
Visit = ifelse(grepl("\\d", name), 1 + as.integer(gsub("\\D", "", name)), 1),
Replicate = ifelse(grepl("r", name, fixed = T), 2, 1))
# A tibble: 12 x 3
ID Visit Replicate
<chr> <dbl> <dbl>
1 A 1 1
2 A 1 2
3 A 2 1
4 A 2 2
5 B 1 1
6 B 1 2
7 B 2 1
8 B 2 2
9 C 1 1
10 C 1 2
11 C 2 1
12 C 2 2
Upvotes: 2
Reputation: 196
I am new to it. But I tried like this and it worked for me. Yes you can do like this:
New_data <- data.frame("variable1" = old$variable1,
"variable2" = old$variable2,
"variable3" = old$variable3)
Upvotes: 0