Robert
Robert

Reputation: 530

Conditional Summary in R: MaxSum

I'd have a data frame of authors in a much larger data set than the example in R that I'd like to get better descriptive's of. I know (kinda of) how to get the maxsum but how could I get the max summary of unique authors EXCEPT for the top 2 most frequent authors for example? How would I then be able to determine the new maxsum? How would I get the actual summary that the new maxsum would be 3 instead of an output of it?

I'm basically looking for conditional way's of summarizing my data. Can anyone help me out in this department?

dat <- data.frame(author=c("a", "b", "c", "d", "a", "b", "c", "d", "e", "a", "a", "a","a", "a", "c","c","c","c"),Post=c("one", "one", "one", "one", "one", "one", "one", "one", "one", "one","one", "one","one", "one","one", "one","one", "one"))
authors <-dat[,1]
author_vec <- (authors)
length(unique(author_vec)) #5
ex_s <- summary(as.factor(neg.author_vec),maxsum=5)

Upvotes: 2

Views: 539

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521639

Here is an approach using the plyr library:

require(plyr)
temp <- ddply(dat, ~author, summarise, sum=length(author))
temp <- temp[order(-temp$sum), ][3:nrow(temp), ]

> temp
  author sum
2      b   2
4      d   2
5      e   1

The authors a and c have been removed because they were two most frequently appearing authors in the data set.

Upvotes: 1

IRTFM
IRTFM

Reputation: 263381

It wasn't clear how many you expected after exclusion of the top 2. This assumes you wanted the next three in frequency (since you said you understood how maxsum was acting). If you wanted the next five, then add two to your current maxsum::

ex_s <- sort(summary(author_vec,maxsum=5), decreasing=TRUE)[-(1:2)]
ex_s
#------
b d e 
2 2 1 

Upvotes: 0

Related Questions