Reputation: 3
Please apologize if this question is offendingly easy, but I am still an absulte beginner in R. I had some introduction courses. I know the basics. I can fairly well read it, but I still find myself lost.
Here is my question: I have two dataframes, that somehow look like this.
Table1 <- data.frame("Name"=c("aaa","bbb-2; bbb", "ddd", "eee", "fff-2; fff"),
"Values1"= c(2, 0.5, 0.1, 3, 4))
Table2 <- data.frame ("Name"=c("aaa", "ccc", "bbb; ddd", "fff"),
"Values2"= c(5, 2, 1, NA), "Values3"= c(4, NA, 1, NA))
I would like to merge the dataframes by the Names columns, but in these name columns are some entries that are semicolon-seperated (e.g. "bbb-2; bbb") and some have some unnecessary additons like "-2".
In Table1, it seems if only those entries with "-2" but otherwise same names are seperated with the semicolon (e.g. "bbb-2; bbb"). In Table2 different names are seperated. (e.g "bbb; ddd")
I would like to add the values from Table2 to Table1, if either one of the semicolon-seperated values match. I want to keep all data in Table1. But not all data from in Table2. If Table1 was not enriched by data from Table2, the value should be ""/NULL, not 0.
However, I would like to avoid duplicating data by dividing the entries in seperate rows because of statistics!
Upvotes: 0
Views: 56
Reputation: 887841
We can do a sub
to remove those substring
Table1$Name <- sub("-\\d+", "", Table1$Name)
and then do the merge
merge(Table1, Table2, by = "Name", all.x = TRUE)
Upvotes: 0