Reputation: 1895
I have a data frame that looks just like this (see link). I'd like to take the output that is produced below and go one step further by spreading the tone variable across both the n and the average variables. It seems like this topic might bear on this, but I can't get it to work: Is it possible to use spread on multiple columns in tidyr similar to dcast?
I'd like the final table to have the source variable in one column, then then the tone-n and tone-avg variables to be in columns. So I'd like the column headers to be "source" - "For - n" - "Against - n" "For -Avg" - "Against - Avg". This is for publication, not for further calculation, so it's about presenting data. It seems more intuitive to me to present data in this way. Thank you.
#variable1
Politician.For<-sample(seq(0,4,1),50, replace=TRUE)
#variable2
Politician.Against<-sample(seq(0,4,1),50, replace=TRUE)
#Variable3
Activist.For<-sample(seq(0,4,1),50,replace=TRUE)
#variable4
Activist.Against<-sample(seq(0,4,1),50,replace=TRUE)
#dataframe
df<-data.frame(Politician.For, Politician.Against, Activist.For,Activist.Against)
#tidyr
df %>%
#Gather all columns
gather(df) %>%
#separate by the period character
#(default separation character is non-alpha numeric characterr)
separate(col=df, into=c('source', 'tone')) %>%
#group by both source and tone
group_by(source,tone) %>%
#summarise to create counts and average
summarise(n=sum(value), avg=mean(value)) %>%
#try to spread
spread(tone, c('n', 'value'))
Upvotes: 5
Views: 1892
Reputation: 66819
Using data.table
syntax (thanks @akrun):
library(data.table)
dcast(
setDT(melt(df))[,c('source', 'tone'):=
tstrsplit(variable, '[.]')
][,list(
N = sum(value),
avg= mean(value))
,by=.(source, tone)],
source~tone,
value.var=c('N','avg'))
Upvotes: 1
Reputation: 7248
I think what you want is another gather to break out the count and mean as separate observations, the gather(type, val, -source, -tone)
below.
gather(df, who, value) %>%
separate(who, into=c('source', 'tone')) %>%
group_by(source, tone) %>%
summarise(n=sum(value), avg=mean(value)) %>%
gather(type, val, -source, -tone) %>%
unite(stat, c(tone, type)) %>%
spread(stat, val)
Yields
Source: local data frame [2 x 5]
source Against_avg Against_n For_avg For_n
1 Activist 1.82 91 1.84 92
2 Politician 1.94 97 1.70 85
Upvotes: 5