Reputation: 313
I have some newbie question for confirmed R users :-). I have an object of class "loci" with rows corresponding to individuals, and columns corresponding to genotypes at different SNP loci (+ 1 column for population information):
gen.loc
Allelic data frame: 283 individuals
151 loci
1 additional variable
as.data.frame(gen.loc)
population PBA10091 PBA10106 PBA10242 PBA10272 PBA11037 PBA11455 PBA11744
001 ANTE 01/02 01/01 01/01 02/02 02/02 02/02 01/01
002 ANTE 01/01 01/01 01/01 02/02 01/02 02/02 01/02
003 ANTE 01/01 02/02 01/01 02/02 02/02 01/02 01/01
004 ANTE 01/01 01/01 01/01 02/02 02/02 01/02 01/01
005 ANTE 01/02 02/02 01/01 02/02 02/02 02/02 01/02
006 ANTE 01/01 02/02 01/02 01/02 01/02 02/02 01/01
I have 12 populations that are defined in my "population" column. I would like to calculate pairwise genotypic distance between individuals within each population.
With just one pop, the command would be:
d <- dist.gene(gen.loc, method="pairwise", pairwise.deletion = TRUE, variance = FALSE)
It returns an object of class 'dist' with pairwise differences between individuals.
However, I would like to split my dataframe according to the 12 levels of the "population" columns, and factorize this procedure using an 'apply' function.
I tried the 'ddply' function of the plyr library:
ddply(as.data.frame(gen.loc), as.data.frame(gen.loc)$population, function(e) dist.gene(e, method="pairwise", pairwise.deletion = TRUE, variance = FALSE))
Unfortunately this command returns an error message:
Error in eval(expr, envir, enclos) : object 'ANTE' not found
'ANTE' being the first pop that appears in the dataframe, I guess the splitting gone wrong somehow. Also, I guess than there could be an issue with the fact the dist.gene outcome is a 'dist' object and not an actual R dataframe.
Is there a better way to use ddply here? Or another approach to split my dataframe while applying the dist.gene command? Otherwise I guess I will just be creating one input dataframe per pop... :-) Not convenient if one has a large number of pops though!!
Thanks for any help!
All the best,
Chrys
Upvotes: 0
Views: 82
Reputation: 13591
Give this a try?
df <- as.data.frame(gen.loc)
split.df <- split(df, df$population) # split data frame into list by distinct population
result <- lapply(split.df, function(i) dist.gene(i, method="pairwise", pairwise.deletion = TRUE, variance = FALSE)) # iterate through list and calculate pairwise distance
Upvotes: 1