Reputation: 163
I have the two data sets below:
df <- read.table(text =
"Human_Gene_Name hsapiens mmusculus ggallus celegans dmelanogaster cintestinalis trubripes xtropicalis mmulatta
A1CF 5.634789603 4.787491743 3.688879454 2.079441542 3.931825633 2.772588722 3.871201011 3.044522438 4.094344562
AAK1 3.583518938 2.708050201 2.079441542 2.197224577 2.079441542 0.693147181 2.772588722 2.079441542 3.218875825
AAMP 3.555348061 3.17805383 2.48490665 1.791759469 2.302585093 0.693147181 2.48490665 1.098612289 2.079441542", header = T)
ctn_df <- read.table(text = "Species CTN
hsapiens 158
mmusculus 85
ggallus 67
celegans 32
dmelanogaster 27
cintestinalis 19
trubripes 110
xtropicalis 82
mmulatta 71
", header = T)
The values in 'df' represent functional diveresity, I want to work out the pearsons correlation coefficient for each gene based on the species CTNs and functional diversity.
Is there a way I can easily assign CTN to a specific species in the table 'df' based on the data from 'ctn_df'.
Sorry if this is a simple question.
Upvotes: 2
Views: 360
Reputation: 1027
Here's a Tidyverse solution:
library(tidyverse)
gather(df, Species, functional_diveresity, -Human_Gene_Name) %>%
left_join(ctn_df) %>%
group_by(Human_Gene_Name) %>%
summarise(cor(functional_diveresity, CTN))
# # A tibble: 3 x 2
# Human_Gene_Name `cor(functional_diveresity, CTN)`
# <fct> <dbl>
# 1 A1CF 0.756
# 2 AAK1 0.783
# 3 AAMP 0.683
The first two lines produce a tidy dataframe
which makes downstream calculations easier:
gather(df, Species, functional_diveresity, -Human_Gene_Name) %>%
left_join(ctn_df)
# Human_Gene_Name Species functional_diveresity CTN
# 1 A1CF hsapiens 5.6347896 158
# 2 AAK1 hsapiens 3.5835189 158
# 3 AAMP hsapiens 3.5553481 158
# 4 A1CF mmusculus 4.7874917 85
# 5 AAK1 mmusculus 2.7080502 85
# 6 AAMP mmusculus 3.1780538 85
# ....
Upvotes: 0
Reputation: 263362
Use apply
to serially pass row numeric values to cor
as the first argument and then name the correlation values with the first column:
setNames( apply(df[-1], 1, cor, ctn_df$CTN), df$Human_Gene_Name)
A1CF AAK1 AAMP
0.7556590 0.7834861 0.6829534
Upvotes: 2