Reputation: 2593
I'm new in R and I want to do simple operations on the columns of my file. Would someone help me to do this?
I have two big files A and B. There is a specific pattern in columnI@II of my file A. I want to capture it and transform it to the second column of file B. Basically for every name in column I of file A, there are different names in the second column of it. So I want to write the all associated names for every name in the first column into the file B
Here is the structure of my files and desired output:
File A :
family ID
let-7/98/4458/4500 hsa-let-7a
let-7/98/4458/4500 hsa-let-7b
let-7/98/4458/4500 hsa-let-7c
let-7/98/4458/4500 hsa-let-7d
let-7/98/4458/4500 hsa-let-7e
let-7/98/4458/4500 hsa-let-7f
let-7/98/4458/4500 hsa-miR-98
miR-1ab/206/613 hsa-miR-1
miR-1ab/206/613 hsa-miR-206
.
.
.
Output for file A :
output A :
miR family ID
let-7/98/4458/4500 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7/hsa-miR-98
miR-1ab/206/613 hsa-miR-1/hsa-miR-206
.
.
.
File B:
let-7/98/4458/4500
let-7/98/4458/4500
miR-1ab/206/613
miR-1ab/206/613
miR-1ab/206/613
miR-1ab/206/613
.
.
Desired output for file B:
let-7/98/4458/4500 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7/hsa-miR-98
let-7/98/4458/4500 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7/hsa-miR-98
miR-1ab/206/613 hsa-miR-1/hsa-miR-206
miR-1ab/206/613 hsa-miR-1/hsa-miR-206
miR-1ab/206/613 hsa-miR-1/hsa-miR-206
miR-1ab/206/613 hsa-miR-1/hsa-miR-206
.
.
Upvotes: 0
Views: 83
Reputation: 193517
A demonstration of my comment:
out <- merge(aggregate(ID ~ family, A, paste, collapse="/"), B)
out
# family
# 1 let-7/98/4458/4500
# 2 let-7/98/4458/4500
# 3 miR-1ab/206/613
# 4 miR-1ab/206/613
# 5 miR-1ab/206/613
# 6 miR-1ab/206/613
# ID
# 1 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98
# 2 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98
# 3 hsa-miR-1/hsa-miR-206
# 4 hsa-miR-1/hsa-miR-206
# 5 hsa-miR-1/hsa-miR-206
# 6 hsa-miR-1/hsa-miR-206
This is given the following sample data for "A" and "B":
A <- structure(
list(family = c("let-7/98/4458/4500","let-7/98/4458/4500","let-7/98/4458/4500",
"let-7/98/4458/4500","let-7/98/4458/4500","let-7/98/4458/4500",
"let-7/98/4458/4500","miR-1ab/206/613","miR-1ab/206/613"),
ID = c("hsa-let-7a","hsa-let-7b","hsa-let-7c","hsa-let-7d","hsa-let-7e",
"hsa-let-7f", "hsa-miR-98", "hsa-miR-1","hsa-miR-206")),
.Names = c("family", "ID"), class = "data.frame", row.names = c(NA, -9L))
B <- structure(
list(family = c("let-7/98/4458/4500","let-7/98/4458/4500","miR-1ab/206/613",
"miR-1ab/206/613","miR-1ab/206/613", "miR-1ab/206/613")),
.Names = "family", class = "data.frame", row.names = c(NA, -6L))
Upvotes: 1