spindoctor
spindoctor

Reputation: 1895

Using spread to create two value columns with tidyr

I have a data frame that looks just like this (see link). I'd like to take the output that is produced below and go one step further by spreading the tone variable across both the n and the average variables. It seems like this topic might bear on this, but I can't get it to work: Is it possible to use spread on multiple columns in tidyr similar to dcast?

I'd like the final table to have the source variable in one column, then then the tone-n and tone-avg variables to be in columns. So I'd like the column headers to be "source" - "For - n" - "Against - n" "For -Avg" - "Against - Avg". This is for publication, not for further calculation, so it's about presenting data. It seems more intuitive to me to present data in this way. Thank you.

#variable1
Politician.For<-sample(seq(0,4,1),50, replace=TRUE)
#variable2
Politician.Against<-sample(seq(0,4,1),50, replace=TRUE)
#Variable3
Activist.For<-sample(seq(0,4,1),50,replace=TRUE)
#variable4
Activist.Against<-sample(seq(0,4,1),50,replace=TRUE)
#dataframe
df<-data.frame(Politician.For, Politician.Against, Activist.For,Activist.Against)

#tidyr
df %>%
 #Gather all columns 
 gather(df) %>%
 #separate by the period character 
 #(default separation character is non-alpha numeric characterr) 
 separate(col=df, into=c('source', 'tone')) %>%
 #group by both source and tone  
 group_by(source,tone) %>%
 #summarise to create counts and average
 summarise(n=sum(value), avg=mean(value)) %>%
 #try to spread
 spread(tone, c('n', 'value'))

Upvotes: 5

Views: 1892

Answers (2)

Frank
Frank

Reputation: 66819

Using data.table syntax (thanks @akrun):

library(data.table)
dcast(
  setDT(melt(df))[,c('source', 'tone'):=
      tstrsplit(variable, '[.]')
    ][,list(
      N  = sum(value),
      avg= mean(value))
    ,by=.(source, tone)],
  source~tone,
  value.var=c('N','avg'))

Upvotes: 1

user295691
user295691

Reputation: 7248

I think what you want is another gather to break out the count and mean as separate observations, the gather(type, val, -source, -tone) below.

gather(df, who, value) %>%
    separate(who, into=c('source', 'tone')) %>%
    group_by(source, tone) %>%
    summarise(n=sum(value), avg=mean(value)) %>%
    gather(type, val, -source, -tone) %>%
    unite(stat, c(tone, type)) %>%
    spread(stat, val)

Yields

Source: local data frame [2 x 5]

      source Against_avg Against_n For_avg For_n
1   Activist        1.82        91    1.84    92
2 Politician        1.94        97    1.70    85

Upvotes: 5

Related Questions