TPre
TPre

Reputation: 25

Rearrange data in R to certain columns contents, instead of 2 columns 1

Hi im really new to using R and we have multivariate data to analyse, the raw data is in excel and i want to rearrange the columns or gorup the data in R. Currently sex (B,S) and breed (R,W) are their own columns but i would like to almost merge the breed and sex rows then group data with the same breed and sex. The possible breed and sex combinations are (RB, RS, WB, WS) together, seperating data according to these joint factors instead of individually, to perform an ANOVA. Sorry if this doesnt make sense! or if its even possible. Thankyou for any help.

This is a sample of the data, i dont know how to format it correctly for here so sorry. but it is only 10 from a 12500 sample size

breed sex gestation_period days_to_110kg p1_plus_p3_fat_depth_mm

R B 112 169.56 31.418

W B 118 175.4 27.24

W B 118 188.84 28.784

W B 118 168.68 29.968

W B 118 177.64 27.664

W B 118 174.28 32.028

R S 114 184.94 23.876

R B 114 188.84 22.952

R S 114 183.75 26.65

Call: aov(formula = p1_plus_p3_fat_depth_mm ~ breed + sex + breed:sex, data = Pig)

Residuals: Min 1Q Median 3Q Max -16.521 -2.904 -0.393 2.485 19.880

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.69350 0.08129 303.772 < 2e-16 * breedW 0.60700 0.10887 5.576 2.52e-08 * sexS 2.41582 0.10470 23.073 < 2e-16 ***

breedW:sexS 0.17186 0.15003 1.145 0.252

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.187 on 12800 degrees of freedom Multiple R-squared: 0.08123, Adjusted R-squared: 0.08102 F-statistic: 377.2 on 3 and 12800 DF, p-value: < 2.2e-16

Upvotes: 0

Views: 109

Answers (1)

Humpelstielzchen
Humpelstielzchen

Reputation: 6441

If it's just an anova you want to do, you can skip the grouping part.

#Reading in the data:
breed <- read.table(text ="
breed sex gestation_period days_to_110kg p1_plus_p3_fat_depth_mm

R B 112 169.56 31.418

W B 118 175.4 27.24

W B 118 188.84 28.784

W B 118 168.68 29.968

W B 118 177.64 27.664

W B 118 174.28 32.028

R S 114 184.94 23.876

R B 114 188.84 22.952

R S 114 183.75 26.65", stringsAsFactors = FALSE, header = TRUE)

#Performing ANOVA:
sexbreed_aov <- aov(p1_plus_p3_fat_depth_mm ~ breed + sex + breed:sex, data = breed)

Check the result with summary(). Note: For the interactions you are interested in, the sample data is too small. But you can apply this code as it is.

> anova(sexbreed_aov)
Analysis of Variance Table

Response: days_to_110kg
          Df Sum Sq Mean Sq F value Pr(>F)
breed      1  51.30  51.296  0.7574 0.4176
sex        1  26.47  26.471  0.3909 0.5549
Residuals  6 406.34  67.723  

UPDATE (had to correct some things):

You shouldn't use summary.lm()for a two factor anova, as I first did. This is only useful for one way anova. You can use summary(sexbreed_aov)or anova(sexbreed_aov). So forget about breedW:sexS. If you want to check on all specific interactions you can still do: TukeyHSD(sexbreed_aov)

Upvotes: 0

Related Questions