Reputation: 770
I have addresses I need to compare. I got 90% of the way there thanks to a helpful answer on this site, but I need the last 10%.
I have the code below to generate addresses for comparison. I need to see if there is any difference between addr1
and addr2
.
eg_data <- data.frame(addr1 = c('123 Main St','742 Evergreen
Ter','8435 Roanoke Dr','1340 N State Pkwy') , addr2 = c('123
Main St Apt 4','742 Evergreen Terrace','8435 Roanoke Dr Unit
5','1340 N State Pkwy'), stringsAsFactors = FALSE)
Next part, very helpful, is combining vecsets
subfunction vsetdiff
with strsplit
, to compare the two and extract any difference
eg_data$addr_comp2_1 <- mapply(vsetdiff, strsplit(eg_data$addr2,
split=""), strsplit(eg_data$addr1, split=""))
Run the code and see, but I am left with the differences in the format like c(" ","A","p","t"," ","4")
for difference b/t the row1 addresses, and it is in list form. I need this column to be individual rows of strings or factors. In the data view, I need to see "addr_comp2_1 : chr "123..."
rather than addr_comp2_1
:List of 4 , so that the dataframe itself gives me " Apt 4" in col3 / row1 and not c(" ","A","p","t"," ","4")
.
I have tried
eg_data$fix <- paste(eg_data$addr_comp2_1, collapse=', ')
eg_data$fix2 <- str_c(eg_data$addr_comp2_1, collapse=',')
eg_data$fix3 <- as.factor(eg_data$addr_comp2_1)
eg_data$fix4 <- lapply(eg_data$addr_comp2_1, unlist)
eg_data$fix5 <- (matrix(unlist(eg_data$addr_comp2_1), nrow=4,
byrow=F))
eg_data$fix6 <- unlist(eg_data$addr_comp2_1, use.names=FALSE,
recursive=FALSE)
These obviously don't work. The fix5
is close, but it gives each individual character its own row, as opposed to taking the groupings of c()
, so I end up with 17 rows, instead of adding a single column of four.
Any help is appreciated.
Upvotes: 0
Views: 66
Reputation: 1210
You just have to concatenate the results. lapply
function will do it for you.
Code
eg_data <- data.frame(addr1 = c('123 Main St','742 Evergreen
Ter','8435 Roanoke Dr','1340 N State Pkwy') , addr2 = c('123
Main St Apt 4','742 Evergreen Terrace','8435 Roanoke Dr Unit
5','1340 N State Pkwy'), stringsAsFactors = FALSE)
eg_data$addr_comp2_1 <- mapply(vsetdiff, strsplit(eg_data$addr2,split=""), strsplit(eg_data$addr1, split=""))
eg_data$addr_comp2_2 = lapply(eg_data$addr_comp2_1, paste, collapse = '')
Output
Upvotes: 1