nietzschemouse
nietzschemouse

Reputation: 87

Combining multiple files in R by row, where each file becomes a column in the data frame

I couldn't find this question answered.

I am trying to read in multiple files that are very similar. They contain two rows, one ID and one Value. The files share most of the IDs, but there may be some variance.

I want to have one column of all of the IDs (non-repeating) and the rest of the columns contain the value that ID had in each file.

I've looked into using rbind and *apply functions, but it looks like these aren't going to do what I need.

Any ideas?

Data Example

table1.txt

ID      Value
Steve   25
Jim     50

table2.txt

ID      Value
Jim     35
Dave    12

table3.txt

ID      Value
Jim     90
Steve   40
Dave    12

With the end goal being

ID     table1     table2    table3
Steve  25         0         40
Jim    50         35        90
Dave   0          12        12

Upvotes: 0

Views: 295

Answers (2)

IRTFM
IRTFM

Reputation: 263301

First setting up a test case:

table1.txt <-"ID      Value
Steve   25
Jim     50"
table2.txt<-"ID      Value
Jim     35
Dave    12"
table3.txt<-"ID      Value
Jim     90
Steve   40
Dave    12"

Now read in the data to a list of dataframes:

inp.list <- lapply( paste0("table", 1:3, ".txt"), 
               function(fnam) read.table(text = get(fnam), header=TRUE)  )

The Reduce function is useful for "walking" along a list and applying a two argument function serially to the accumulated result and the next item in hte list. The all-parameter means that the non-matches in the merge get NA's:

Reduce(function(x,y) {merge(x,y, by="ID", all=TRUE)} , inp.list)
#--------
     ID Value.x Value.y Value
1  Dave      NA      12    12
2   Jim      50      35    90
3 Steve      25      NA    40

I realize that I'm not completely answering the question about making the missing items = 0, but they are not zero... they are missing.

Upvotes: 0

MichaelChirico
MichaelChirico

Reputation: 34703

Using data.table's rbindlist, which has an idcol argument that will help you out, and dcast, whose fill argument will flesh out the missing data.

library(data.table)
fl = paste0("table", 1:3)
files = setNames(paste0(fl, ".txt"), fl)

DT = dcast(rbindlist(lapply(files, fread), idcol = TRUE),
           ID ~ .id, value.var = "Value", fill = 0L)
DT
#       ID table1 table2 table3
# 1:  Dave      0     12     12
# 2:   Jim     50     35     90
# 3: Steve     25      0     40

lapply(files, fread) returns each file as a data.table in a list. rbindlist stacks these individual data.tables, and idcol (combined with being savvy enough to pass a named vector to lapply makes sure we keep track of which file they came from. Then we reshape wide with dcast, filling in missing combinations with 0 through the fill argument.

Upvotes: 2

Related Questions