Jarl
Jarl

Reputation: 45

Manipulating genotype file in R

I have a csv file with genetic information for several diploid individuals. It looks like the following test file:

 Sample  L1 L1.1  L2 L2.1
      1 100  100 200  220
      2 100  100 220  220
      3 110  100 220  220
      4 100  110 220  220
      5 110  110 220  220

I need to combine the information so that the data frame looks as follows:

 Sample L1  L2 
      1 100 200
      1 100 220
      2 100 220
      2 100 220
      3 110 220
      3 100 220
      4 100 220
      4 110 220
      5 110 220
      5 110 220

Does anyone know of a nice way to do this for a large data set?

Many thanks.

Upvotes: 3

Views: 122

Answers (2)

Jilber Urbina
Jilber Urbina

Reputation: 61154

Here's an approach

> df <- read.table(text="Sample  L1 L1.1  L2 L2.1
        1 100  100 200  220
        2 100  100 220  220
        3 110  100 220  220
        4 100  110 220  220
        5 110  110 220  220
  ", header=T)  # this is your data.frame

> names(df)[-1] <- gsub("\\.[[:digit:]]", "", names(df)[-1]) # cleaning up the colnames
> new_df <- reshape(df, idvar = "Sample", times = names(df)[-1],
                   varying = list(2:5), direction = "long")
> 
> data.frame(Sample=rep(df$Sample, each=length(unique(new_df$time))),
                       do.call(cbind, split(new_df$L, new_df$time)))
   Sample  L1  L2
1       1 100 200
2       1 100 220
3       2 110 220
4       2 100 220
5       3 110 220
6       3 100 200
7       4 100 220
8       4 110 220
9       5 100 220
10      5 110 220

Upvotes: 2

zx8754
zx8754

Reputation: 56149

Try this:

#dummy data
df <- read.table(text="
Sample  L1 L1.1  L2 L2.1
1 100  100 200  220
2 100  100 220  220
3 110  100 220  220
4 100  110 220  220
5 110  110 220  220
",header=T)

#transform
df1 <- data.frame(Sample=rep(df$Sample,2),
                  L1=c(df$L1,df$L1.1),
                  L2=c(df$L2,df$L2.1))
#order by Sample
df1[order(df1$Sample),]

Upvotes: 3

Related Questions