Reputation: 502
Hi can you please explain how I can merge two tables that they can be used to generate a piechart?
#read input data
dat = read.csv("/ramdisk/input.csv", header = TRUE, sep="\t")
# pick needed columns and count the occurences of each entry
df1 = table(dat[["C1"]])
df2 = table(dat[["C2"]])
# rename columns
names(df1) <- c("ID", "a", "b", "c", "d")
names(df2) <- c("ID", "e", "f", "g", "h")
# show data for testing purpose
df1
# ID a b c d
#241 18 17 28 29
df2
# ID e f g h
#230 44 8 37 14
# looks fine so far, now the problem:
# what I want to do ist merging df and df2
# so that df will contain the overall numbers of each entry
# df should print
# ID a b c d e f g h
#471 18 17 28 29 44 8 37 14
# need them to make a nice piechart in the end
#pie(df)
I assume it can be done with merge somehow, but I haven't found the right way. The closest solution I found was merge(df1,df2,all=TRUE), but it wasn't exactly what I've needed.
Upvotes: 1
Views: 55
Reputation: 47350
I wrote the package safejoin
that handle this type of tasks in an intuitive way (I hope!). You just need to have a common id between your 2 tables (we'll use tibble::row_id_to_column
for that) and then you can merge and handle the column conflict with sum
.
Using @pierre-lapointe's data :
library(tibble)
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
res <- safe_inner_join(rowid_to_column(df1),
rowid_to_column(df2),
by = "rowid",
conflict = sum)
res
# rowid ID a b c d e f g h
# 1 1 471 18 17 28 29 44 8 37 14
The for a given row (here the first and only), you can get your pie chart by converting to a vector with unlist and removing the irrelevant 2 first elements :
pie(unlist(res[1,])[-(1:2)])
Upvotes: 0
Reputation: 887851
An approach would be to stack
, then rbind
and do an aggregate
out <- aggregate(values ~ ., rbind(stack(df1), stack(df2)), sum)
To get a named vector
with(out, setNames(values, ind))
Or another approach is to concatenate the tables and then use tapply
to do a group by sum
v1 <- c(df1, df2)
tapply(v1, names(v1), sum)
Or with rowsum
rowsum(v1, group = names(v1))
Upvotes: 1
Reputation: 16277
Another approach would be to use rbindlist
from data.table
and colSums
to get the totals. rbindlist
with fill=TRUE
accepts all columns, even if they are not present in both tables.
df1<-read.table(text="ID a b c d
241 18 17 28 29 ",header=TRUE)
df2<-read.table(text="ID e f g h
230 44 8 37 14" ,header=TRUE)
library(data.table)
setDT(df1)
setDT(df2)
res <- rbindlist(list(df1,df2), use.names=TRUE, fill=TRUE)
colSums(res, na.rm=TRUE)
ID a b c d e f g h
471 18 17 28 29 44 8 37 14
Upvotes: 0