Reputation: 3393
I am trying to add several columns to a data.frame from an other data.frame:
The data.frame from which I want to add columns:
head(fix)[1:2,]
Year Name Moders.hjälp. Utg.Sjukvård. Antal.Fall.Moderskapshjälp. Antal.Dagar.Moderskapshjälp. Antal.Dödsfall.
1 1921 Allians 2003 NA 42 1603 43
2 1921 Bageri- och konditoriindustriarb. I Stocholm NA NA NA NA 10
In other words, I want to add fix[,3:ncol(fix)]
to the:
head(data)[1:4,]
Year Name Delägare.män. Delägare.kvinnor. Sjukdomsfall.män.
92 1921 Sbk. Allians 2416 1610 526
198 1921 Bageri- och Konditoriindustriarb. I Stockholm sbh-k. 143 13 19
by matching the Year
column and Name
column.
The problem
is that:
The column Name
in fix
and data
above have slightly different names (i.e. Allians
VS Sbk. Allians
). I can't find a correct solution that matches parts of strings to find similarities. I tried to use match
but didn't succeed...
Here is dput
dput(head(fix)[1:2,])
structure(list(Year = c(1921L, 1921L), Name = c("Allians", "Bageri- och konditoriindustriarb. I Stocholm"
), Moders.hjälp. = c(2003, NA), Utg.Sjukvård. = c(NA_integer_,
NA_integer_), Antal.Fall.Moderskapshjälp. = c(42L, NA), Antal.Dagar.Moderskapshjälp. = c(1603L,
NA), Antal.Dödsfall. = c(43L, 10L)), .Names = c("Year", "Name",
"Moders.hjälp.", "Utg.Sjukvård.", "Antal.Fall.Moderskapshjälp.",
"Antal.Dagar.Moderskapshjälp.", "Antal.Dödsfall."), row.names = 1:2, class = "data.frame")
dput(head(data)[,c(1:2,11:13)])
structure(list(Year = c(1921L, 1924L, 1921L, 1924L, 1921L, 1924L
), Name = c("Sbk. Allians", "Sbk. Allians", "Bageri- och Konditoriindustriarb. I Stockholm sbh-k.",
"Bageri- och Konditoriindustriarb. I Stockholm sbh-k.", "Bergsunds verkstads arbetares sbk",
"Bergsunds verkstads arbetares sbk"), Delägare.män. = c(2416L,
3896L, 143L, 129L, 280L, 289L), Delägare.kvinnor. = c(1610L,
4300L, 13L, 13L, 2L, NA), Sjukdomsfall.män. = c(526L, 1084L,
19L, 34L, 100L, 97L)), .Names = c("Year", "Name", "Delägare.män.",
"Delägare.kvinnor.", "Sjukdomsfall.män."), class = "data.frame", row.names = c(92L,
93L, 198L, 199L, 222L, 223L))
Greatful for any proposals!
Upvotes: 1
Views: 435
Reputation: 13280
You can use agrep
:
sapply(data$Name, function(x) agrep(x, fix$Name, max.distance=0.4))
which matches data$Name with fix$Name. You could also play around with max.distance (perhabs in a loop). Afterwards you can merge/index/etc what you want with the matches...
Update
Something along these lines should do the job for you:
# match
matches <- sapply(data$Name, function(x) agrep(x, fix$Name, max.distance=0.4))
# clean match
matches_cleaned <- sapply(matches, function(x) ifelse(length(x) > 0,x, NA))
# add matched names to data
data$fix_names <- fix$Name[matches_cleaned]
# merge
merge(data, fix, by.x = c('Year', 'fix_names'), by.y = c('Year', 'Name'))
Upvotes: 4