Maya_Cent
Maya_Cent

Reputation: 481

Add rows content to the column end of the another row

I have the following data frame:

> df
GENE     ID     EXPR1     EXPR2
ENS127   1122O   1.2       1.2
ENS127   1122O   2.3       1.0
ENS555   33212   4.5       3.9
ENS555   33212   1.2       3.7
ENS941   44444   2.3       3.6

I'm looking for a way to get all rows with similar GENE in one, so that for each unique GENE there is only one row containing all the values of the third column onward. This is going to be iterated utill end of a big data frame.
The output would look like this:

> df.output
GENE     ID     EXPR1   EXPR2   EXPR.01   EXPR.02  
ENS127   1122O   1.2     1.2     2.3        1.0     
ENS555   33212   4.5     3.9     1.2        3.7
ENS941   44444   2.3     3.6     NA        NA

I appreciate any help.

Upvotes: 1

Views: 47

Answers (2)

ekoam
ekoam

Reputation: 8844

Here is a data.table solution

library(data.table)
setDT(df)[, rid := rowid(GENE, ID)]
dcast(df, GENE + ID ~ rid, sep = ".", value.var = c("EXPR1", "EXPR2"))

Output

     GENE    ID EXPR1.1 EXPR1.2 EXPR2.1 EXPR2.2
1: ENS127 1122O     1.2     2.3     1.2     1.0
2: ENS555 33212     4.5     1.2     3.9     3.7
3: ENS941 44444     2.3      NA     3.6      NA

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

You can get the data in long format so that all the EXPR values are in one column, create a unique column names for each GENE and then get the data in wide format.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = starts_with('EXPR')) %>%
  group_by(GENE, name) %>%
  mutate(name = paste(name, row_number(), sep = '_')) %>%
  pivot_wider()

#   GENE   ID    EXPR1_1 EXPR2_1 EXPR1_2 EXPR2_2
#  <chr>  <chr>   <dbl>   <dbl>   <dbl>   <dbl>
#1 ENS127 1122O     1.2     1.2     2.3     1  
#2 ENS555 33212     4.5     3.9     1.2     3.7
#3 ENS941 44444     2.3     3.6    NA      NA  

data

df <- structure(list(GENE = c("ENS127", "ENS127", "ENS555", "ENS555", 
"ENS941"), ID = c("1122O", "1122O", "33212", "33212", "44444"
), EXPR1 = c(1.2, 2.3, 4.5, 1.2, 2.3), EXPR2 = c(1.2, 1, 3.9, 
3.7, 3.6)), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 1

Related Questions