Reputation: 87
I couldn't find this question answered.
I am trying to read in multiple files that are very similar. They contain two rows, one ID and one Value. The files share most of the IDs, but there may be some variance.
I want to have one column of all of the IDs (non-repeating) and the rest of the columns contain the value that ID had in each file.
I've looked into using rbind and *apply functions, but it looks like these aren't going to do what I need.
Any ideas?
Data Example
table1.txt
ID Value
Steve 25
Jim 50
table2.txt
ID Value
Jim 35
Dave 12
table3.txt
ID Value
Jim 90
Steve 40
Dave 12
With the end goal being
ID table1 table2 table3
Steve 25 0 40
Jim 50 35 90
Dave 0 12 12
Upvotes: 0
Views: 295
Reputation: 263301
First setting up a test case:
table1.txt <-"ID Value
Steve 25
Jim 50"
table2.txt<-"ID Value
Jim 35
Dave 12"
table3.txt<-"ID Value
Jim 90
Steve 40
Dave 12"
Now read in the data to a list of dataframes:
inp.list <- lapply( paste0("table", 1:3, ".txt"),
function(fnam) read.table(text = get(fnam), header=TRUE) )
The Reduce function is useful for "walking" along a list and applying a two argument function serially to the accumulated result and the next item in hte list. The all
-parameter means that the non-matches in the merge get NA's:
Reduce(function(x,y) {merge(x,y, by="ID", all=TRUE)} , inp.list)
#--------
ID Value.x Value.y Value
1 Dave NA 12 12
2 Jim 50 35 90
3 Steve 25 NA 40
I realize that I'm not completely answering the question about making the missing items = 0, but they are not zero... they are missing.
Upvotes: 0
Reputation: 34703
Using data.table
's rbindlist
, which has an idcol
argument that will help you out, and dcast
, whose fill
argument will flesh out the missing data.
library(data.table)
fl = paste0("table", 1:3)
files = setNames(paste0(fl, ".txt"), fl)
DT = dcast(rbindlist(lapply(files, fread), idcol = TRUE),
ID ~ .id, value.var = "Value", fill = 0L)
DT
# ID table1 table2 table3
# 1: Dave 0 12 12
# 2: Jim 50 35 90
# 3: Steve 25 0 40
lapply(files, fread)
returns each file as a data.table
in a list
. rbindlist
stacks these individual data.tables
, and idcol
(combined with being savvy enough to pass a named vector to lapply
makes sure we keep track of which file they came from. Then we reshape wide with dcast
, filling in missing combinations with 0
through the fill
argument.
Upvotes: 2