Nerdbert
Nerdbert

Reputation: 3

R: Merging/Match dataframes with semicolon-seperated values in the defining columns

Please apologize if this question is offendingly easy, but I am still an absulte beginner in R. I had some introduction courses. I know the basics. I can fairly well read it, but I still find myself lost.

Here is my question: I have two dataframes, that somehow look like this.

Table1 <- data.frame("Name"=c("aaa","bbb-2; bbb", "ddd", "eee", "fff-2; fff"),
"Values1"= c(2, 0.5, 0.1, 3, 4))

Table2 <- data.frame ("Name"=c("aaa", "ccc", "bbb; ddd", "fff"),
"Values2"= c(5, 2, 1, NA), "Values3"= c(4, NA, 1, NA))

I would like to merge the dataframes by the Names columns, but in these name columns are some entries that are semicolon-seperated (e.g. "bbb-2; bbb") and some have some unnecessary additons like "-2".

In Table1, it seems if only those entries with "-2" but otherwise same names are seperated with the semicolon (e.g. "bbb-2; bbb"). In Table2 different names are seperated. (e.g "bbb; ddd")

I would like to add the values from Table2 to Table1, if either one of the semicolon-seperated values match. I want to keep all data in Table1. But not all data from in Table2. If Table1 was not enriched by data from Table2, the value should be ""/NULL, not 0.

However, I would like to avoid duplicating data by dividing the entries in seperate rows because of statistics!

Upvotes: 0

Views: 56

Answers (1)

akrun
akrun

Reputation: 887841

We can do a sub to remove those substring

Table1$Name <- sub("-\\d+", "", Table1$Name)

and then do the merge

merge(Table1, Table2, by = "Name", all.x = TRUE)

Upvotes: 0

Related Questions