Reputation: 45
I have a csv file with genetic information for several diploid individuals. It looks like the following test file:
Sample L1 L1.1 L2 L2.1
1 100 100 200 220
2 100 100 220 220
3 110 100 220 220
4 100 110 220 220
5 110 110 220 220
I need to combine the information so that the data frame looks as follows:
Sample L1 L2
1 100 200
1 100 220
2 100 220
2 100 220
3 110 220
3 100 220
4 100 220
4 110 220
5 110 220
5 110 220
Does anyone know of a nice way to do this for a large data set?
Many thanks.
Upvotes: 3
Views: 122
Reputation: 61154
Here's an approach
> df <- read.table(text="Sample L1 L1.1 L2 L2.1
1 100 100 200 220
2 100 100 220 220
3 110 100 220 220
4 100 110 220 220
5 110 110 220 220
", header=T) # this is your data.frame
> names(df)[-1] <- gsub("\\.[[:digit:]]", "", names(df)[-1]) # cleaning up the colnames
> new_df <- reshape(df, idvar = "Sample", times = names(df)[-1],
varying = list(2:5), direction = "long")
>
> data.frame(Sample=rep(df$Sample, each=length(unique(new_df$time))),
do.call(cbind, split(new_df$L, new_df$time)))
Sample L1 L2
1 1 100 200
2 1 100 220
3 2 110 220
4 2 100 220
5 3 110 220
6 3 100 200
7 4 100 220
8 4 110 220
9 5 100 220
10 5 110 220
Upvotes: 2
Reputation: 56149
Try this:
#dummy data
df <- read.table(text="
Sample L1 L1.1 L2 L2.1
1 100 100 200 220
2 100 100 220 220
3 110 100 220 220
4 100 110 220 220
5 110 110 220 220
",header=T)
#transform
df1 <- data.frame(Sample=rep(df$Sample,2),
L1=c(df$L1,df$L1.1),
L2=c(df$L2,df$L2.1))
#order by Sample
df1[order(df1$Sample),]
Upvotes: 3