R Data Manipulation Help: For each value in column B, look in column A for the "best match" and set value to that value

Question

I am doing some genetics work and I am trying to utilize two data sets, however each one refers to the genes differently. One dataset (lets call it Column A) refers to each gene by its conventional name as well as a specific ID. The other data set uses some combination of those two variables. I'd rather not have to rerun the pipeline to get the names to align, so I was hoping I could match and copy values as per the title. To reiterate, I was hoping that for each value in column B, look in column A for the "best match" and set value to that value.

Below is an example of the columns and my issues

Column A	Column B
gene1-ID001	gene1
gene2-ID002	ID002
gene3-ID003	gene3-ID003
gene4-ID004	gene4

The desired output is as follows:

Column A	Column B
gene1-ID001	gene1-ID001
gene2-ID002	gene2-ID002
gene3-ID003	gene3-ID003
gene4-ID004	gene4-ID004

The names/gene IDs in each data set are tied to different values and are not arranged in a specific order.

Happy to learn about any established tools for these kinds of issues as well. Thanks you.

I'm fairly certain I need to use grep in some manner to check for partial matches before setting values, but the finesse in setting this up is something I'm lacking. Handling issues when there are more than one partial matches and picking the "most matching" is a problem.

R Data Manipulation Help: For each value in column B, look in column A for the "best match" and set value to that value

Answers (1)

Related Questions

R Data Manipulation Help: For each value in column B, look in column A for the &quot;best match&quot; and set value to that value

Answers (1)

Related Questions

R Data Manipulation Help: For each value in column B, look in column A for the "best match" and set value to that value