Reputation: 481
I have the following data frame:
> df
GENE ID EXPR1 EXPR2
ENS127 1122O 1.2 1.2
ENS127 1122O 2.3 1.0
ENS555 33212 4.5 3.9
ENS555 33212 1.2 3.7
ENS941 44444 2.3 3.6
I'm looking for a way to get all rows with similar GENE in one, so that for each unique GENE there is only one row containing all the values of the third column onward. This is going to be iterated utill end of a big data frame.
The output would look like this:
> df.output
GENE ID EXPR1 EXPR2 EXPR.01 EXPR.02
ENS127 1122O 1.2 1.2 2.3 1.0
ENS555 33212 4.5 3.9 1.2 3.7
ENS941 44444 2.3 3.6 NA NA
I appreciate any help.
Upvotes: 1
Views: 47
Reputation: 8844
Here is a data.table
solution
library(data.table)
setDT(df)[, rid := rowid(GENE, ID)]
dcast(df, GENE + ID ~ rid, sep = ".", value.var = c("EXPR1", "EXPR2"))
Output
GENE ID EXPR1.1 EXPR1.2 EXPR2.1 EXPR2.2
1: ENS127 1122O 1.2 2.3 1.2 1.0
2: ENS555 33212 4.5 1.2 3.9 3.7
3: ENS941 44444 2.3 NA 3.6 NA
Upvotes: 0
Reputation: 388982
You can get the data in long format so that all the EXPR
values are in one column, create a unique column names for each GENE
and then get the data in wide format.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with('EXPR')) %>%
group_by(GENE, name) %>%
mutate(name = paste(name, row_number(), sep = '_')) %>%
pivot_wider()
# GENE ID EXPR1_1 EXPR2_1 EXPR1_2 EXPR2_2
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 ENS127 1122O 1.2 1.2 2.3 1
#2 ENS555 33212 4.5 3.9 1.2 3.7
#3 ENS941 44444 2.3 3.6 NA NA
data
df <- structure(list(GENE = c("ENS127", "ENS127", "ENS555", "ENS555",
"ENS941"), ID = c("1122O", "1122O", "33212", "33212", "44444"
), EXPR1 = c(1.2, 2.3, 4.5, 1.2, 2.3), EXPR2 = c(1.2, 1, 3.9,
3.7, 3.6)), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 1