Reputation: 1414
I have 5 weeks of measured data in 5 separate CSV files, and am looking for a way to merge them into a single document that makes sense. The issue I'm having is that not all data points are present in each file, my largest has ~20k rows and my smallest has ~2k so there isn't a 1:1 relation. Here's what my data looks like:
Keyword URL 5/12 Rank
activity site.com 2
activity site.com/page 1
backup site.com/backup 4
The next file would look something like this:
Keyword URL 5/19 Rank
activity site.com/page 2
database site.com/data 3
What I'd like to end up with is something like this
Keyword URL 5/12 Rank 5/19 Rank
activity site.com 2 -
activity site.com/page 1 2
backup site.com/backup 4 -
database site.com/data - 3
My preference would be to do this with R. I think plyr will make this a snap, but I've never used it before and I'm just not getting how this comes together.
Upvotes: 0
Views: 122
Reputation: 12829
Use merge
:
csv1 <- read.table(header=TRUE, text="
Keyword URL 5/12_Rank
activity site.com 2
activity site.com/page 1
backup site.com/backup 4
")
csv2 <- read.table(header=TRUE, text="
Keyword URL 5/19_Rank
activity site.com/page 2
database site.com/data 3
")
csv12 <- merge(csv1, csv2, all=TRUE)
#> csv12
# Keyword URL X5.12_Rank X5.19_Rank
#1 activity site.com 2 NA
#2 activity site.com/page 1 2
#3 backup site.com/backup 4 NA
#4 database site.com/data NA 3
If you have several tables, you can put them in a list
and use Reduce
:
csv3 <- read.table(header=TRUE, text="
Keyword URL 5/42_Rank
activity site.com 5
html site.com/data 6
")
L <- list(csv1, csv2, csv3)
Reduce(f=function(x,y)merge(x,y,all=TRUE), L)
Result
# Keyword URL X5.12_Rank X5.19_Rank X5.42_Rank
#1 activity site.com 2 NA 5
#2 activity site.com/page 1 2 NA
#3 backup site.com/backup 4 NA NA
#4 database site.com/data NA 3 NA
#5 html site.com/data NA NA 6
Upvotes: 2