Reputation: 1895
I'm working with a data frame that looks like this. I would like to to look like:
province : district : party A votes : party A Percent : party B votes : party B percent : party C votes : party C percent
Right now the candidate name is functioning well as a unique identifier to avoid the need for an aggregation function, but I ultimately will be OK dropping it.
candidate<-c('bob jones', 'bobby jones', 'sara jones', 'sara norah', 'nora jones', 'other name', 'name other', 'thomas name', 'name judge', 'my mayor', 'peter peter', 'paul paul')
party<-rep(c('A', 'B', 'C'), 4)
district<-c(rep('District 1', 3), rep('District 2', 3), rep('District 3', 3), rep('Disctrict 4', 3))
province<-c(rep('Province 1', 3), rep('Province 2', 3), rep('Province 3', 3), rep('Province 4', 3))
votes<-round(rnorm(12, mean=5000, sd=1000),0)
percent<-round(rnorm(12, mean=37, sd=10),2)
df<-data.frame(party, district,province, votes, percent, candidate)
I'm workng with these commands
df.test<-melt(df, id.vars=c('candidate', 'province', 'district', 'party'))
dcast(df.test, candidate+province+district~variable+party, value.var=c('value'))
It's close, put it's not creating one row per district, it's creating four rows per district. The thing is: in my sample data set, when I drop 'candidate' from my casting call, this sample data set works great,e.g.
dcast(df.test, district~variable+party, value.var=c('value'))
But when I use the same call in my data set, I no longer have the unique identifier and this aggregates on length.
Hope you can help. Thanks.
Upvotes: 1
Views: 98
Reputation: 118799
In data.table v1.9.5
, dcast
can cast on multiple value.var
columns. With that you can do:
require(data.table) #v1.9.5+
ans = dcast(setDT(df), province + district ~ party, value.var = c("votes", "percent"))
# province district votes_A votes_B votes_C percent_A percent_B percent_C
# 1: Province 1 District 1 3072 3149 4262 34.29 18.45 19.20
# 2: Province 2 District 2 5918 3970 4201 36.56 46.22 43.16
# 3: Province 3 District 3 5593 5208 5260 26.58 31.20 39.00
# 4: Province 4 Disctrict 4 6138 4537 6293 43.97 43.62 32.48
If you'd like a data.frame
back, then you can do setDF(ans)
to convert ans
to data.frame
.
You can install v1.9.5
by following these instructions.
Upvotes: 1
Reputation: 20811
Here's a base solution:
set.seed(1)
candidate<-c('bob jones', 'bobby jones', 'sara jones', 'sara norah', 'nora jones', 'other name', 'name other', 'thomas name', 'name judge', 'my mayor', 'peter peter', 'paul paul')
party<-rep(c('A', 'B', 'C'), 4)
district<-c(rep('District 1', 3), rep('District 2', 3), rep('District 3', 3), rep('Disctrict 4', 3))
province<-c(rep('Province 1', 3), rep('Province 2', 3), rep('Province 3', 3), rep('Province 4', 3))
votes<-round(rnorm(12, mean=5000, sd=1000),0)
percent<-round(rnorm(12, mean=37, sd=10),2)
df<-data.frame(party, district,province, votes, percent, candidate)
reshape(df, direction = 'wide', times = c('votes','percent'),
idvar = c('province', 'district'),
timevar = 'party', drop = 'candidate')
# district province votes.A percent.A votes.B percent.B votes.C percent.C
# 1 District 1 Province 1 4374 30.79 5184 14.85 4164 48.25
# 4 District 2 Province 2 6595 36.55 5330 36.84 4180 46.44
# 7 District 3 Province 3 5487 45.21 5738 42.94 5576 46.19
# 10 Disctrict 4 Province 4 4695 44.82 6512 37.75 5390 17.11
Upvotes: 1