user2806363
user2806363

Reputation: 2593

How to concatenate associated row names in R?

I'm new in R and I want to do simple operations on the columns of my file. Would someone help me to do this?

I have two big files A and B. There is a specific pattern in columnI@II of my file A. I want to capture it and transform it to the second column of file B. Basically for every name in column I of file A, there are different names in the second column of it. So I want to write the all associated names for every name in the first column into the file B

Here is the structure of my files and desired output:

File A :

 family               ID    
let-7/98/4458/4500      hsa-let-7a  
let-7/98/4458/4500      hsa-let-7b  
let-7/98/4458/4500      hsa-let-7c  
let-7/98/4458/4500      hsa-let-7d  
let-7/98/4458/4500      hsa-let-7e  
let-7/98/4458/4500      hsa-let-7f  
let-7/98/4458/4500      hsa-miR-98  
miR-1ab/206/613         hsa-miR-1   
miR-1ab/206/613         hsa-miR-206 
.
.
.

Output for file A :

output A :

miR family                  ID
let-7/98/4458/4500       hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7/hsa-miR-98
miR-1ab/206/613          hsa-miR-1/hsa-miR-206
.
.
.

File B:

let-7/98/4458/4500
let-7/98/4458/4500
miR-1ab/206/613             
miR-1ab/206/613
miR-1ab/206/613             
miR-1ab/206/613
.

.

Desired output for file B:

let-7/98/4458/4500     hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7/hsa-miR-98
let-7/98/4458/4500     hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7/hsa-miR-98
miR-1ab/206/613        hsa-miR-1/hsa-miR-206
miR-1ab/206/613        hsa-miR-1/hsa-miR-206
miR-1ab/206/613        hsa-miR-1/hsa-miR-206    
miR-1ab/206/613        hsa-miR-1/hsa-miR-206
.
.

Upvotes: 0

Views: 83

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

A demonstration of my comment:

out <- merge(aggregate(ID ~ family, A, paste, collapse="/"), B)
out
#               family
# 1 let-7/98/4458/4500
# 2 let-7/98/4458/4500
# 3    miR-1ab/206/613
# 4    miR-1ab/206/613
# 5    miR-1ab/206/613
# 6    miR-1ab/206/613
#                                                                             ID
# 1 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98
# 2 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98
# 3                                                        hsa-miR-1/hsa-miR-206
# 4                                                        hsa-miR-1/hsa-miR-206
# 5                                                        hsa-miR-1/hsa-miR-206
# 6                                                        hsa-miR-1/hsa-miR-206

This is given the following sample data for "A" and "B":

A <- structure(
  list(family = c("let-7/98/4458/4500","let-7/98/4458/4500","let-7/98/4458/4500",
                  "let-7/98/4458/4500","let-7/98/4458/4500","let-7/98/4458/4500",
                  "let-7/98/4458/4500","miR-1ab/206/613","miR-1ab/206/613"),
       ID = c("hsa-let-7a","hsa-let-7b","hsa-let-7c","hsa-let-7d","hsa-let-7e",
              "hsa-let-7f", "hsa-miR-98", "hsa-miR-1","hsa-miR-206")),
       .Names = c("family", "ID"), class = "data.frame", row.names = c(NA, -9L))

B <- structure(
  list(family = c("let-7/98/4458/4500","let-7/98/4458/4500","miR-1ab/206/613",
                  "miR-1ab/206/613","miR-1ab/206/613", "miR-1ab/206/613")),
  .Names = "family", class = "data.frame", row.names = c(NA, -6L))

Upvotes: 1

Related Questions