Reputation: 5
I've got a dataframe (over 9000 rows) that holds the average gene expression of certain genes (rows) per cellcluster (columns) and now I need to change the gene names (rownames) to the orthologs. It looks like this:
Cluster1 Cluster2 Cluster3
[Tppp2] 10.32 0.14 2.56
[Mtx1] 6.32 8.77 0.30
[Vps37c] 225.02 132.87 9.52
[Slc39a9] 52.13 18.42 4.12
And I have another dataframe (over 13000 rows) that holds the orthologs; gene name as stated (old) - ortholog gene name (new). It looks like this:
GeneName NewGeneName
[1] Vps37c VPS37C
[2] Tppp2 TPPP3
[3] Slc39a9 SLC39A9
[4] Mtx1 MTX1B
So for each of the rows in the first dataframe, the rownames will be matched with the name in dataframe2$GeneName
and then the name in dataframe2$NewGeneName
will be saved as the new rowname (or in a vector that holds all the new rownames in the correct order). For instance Tppp2 will be TPPP3, and so on to result in the following:
rownames(expr_df) <- c("TPPP3", "MTX1B", "VPS37C", "SLC39A9"
.
I have tried a lot of things, and its really bugging me that I can't make it work. I don't remember all the ways that I have tried and failed, sorry.
FYI: The orthologs and the rownames of the expression dataframe do NOT have the same order, and the ortholog dataframe holds more genes than there are present in the expression dataframe.
Upvotes: 0
Views: 44
Reputation: 3876
A tidyverse
solution:
df1 %>%
rownames_to_column() %>%
left_join(df2, by = c("rowname" = "GeneName"))
rowname Cluster1 Cluster2 Cluster3 NewGeneName
1 Tppp2 10.32 0.14 2.56 TPPP3
2 Mtx1 6.32 8.77 0.30 MTX1B
3 Vps37c 225.02 132.87 9.52 VPS37C
4 Slc39a9 52.13 18.42 4.12 SLC39A9
Data
df1 <- tibble::tribble(
~Cluster1, ~Cluster2, ~Cluster3,
10.32, 0.14, 2.56,
6.32, 8.77, 0.3,
225.02, 132.87, 9.52,
52.13, 18.42, 4.12
)
df1 <- as.data.frame(df1)
rownames(df1) <- c("Tppp2", "Mtx1", "Vps37c", "Slc39a9")
df2 <- tibble::tribble(
~GeneName, ~NewGeneName,
"Vps37c", "VPS37C",
"Tppp2", "TPPP3",
"Slc39a9", "SLC39A9",
"Mtx1", "MTX1B"
)
Upvotes: 0