Reputation: 281
I have the following data frame, call it df, which is a data frame consisting in three vectors: "Scene," "Name," and "Appearances." I would like to total the value for "Appearances" for every instance in which the "Name" is in the list and divide it by the number of times the name appears in the list. Then I want to remove from df all the rows in which that total number (total Appearances, divided by the number of times the name is in the list) is less than 2.
So for example, here in df, everyone's row would be tossed out except John's and Hitler's, whose values, are calculated (2+2)/2=2 and (4+1/2)=2.5
Scene Name Appearances
112 Hamlet 1
113 Zyklon 1
114 Hitler 4
115 Chamberlain 1
115 Hitler 1
117 Gospel 1
117 John 2
117 Deussen 1
118 Plato 1
118 John 2
118 Hegel 1
119 Cankara 1
120 Freud 1
121 Freud 1
122 Petersbourg 1
I have tried a couple things, with some multiplication instead, but they're both mathematically wrong and return errors.
First, I tried to turn df into a two way table, and delete entries belonging to an infrequent name:
removeinfreqs <- function(df){
x <- table(df$Name, df$Appearances)
d<-df[(df$Name %in% names * df$Appearances)/df$Name %in% names(x[x >= 3]), ]
d
}
but I got an error: "Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments"
I tried the same sort of thing with the subset command:
df_less<-subset(df, df$Name %in% names * df$Appearances/df$Name %in% names >= 3)
But I get the same error: "Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments"
I have very little experience working with data frames in r. How can I perform this operation? Any help greatly appreciated.
Upvotes: 1
Views: 1532
Reputation: 193677
Here's an alternative with "data.table":
library(data.table)
DT <- data.table(df)
DT[, if(mean(Appearances) >= 2) .SD, by = Name]
# Name Scene Appearances
# 1: Hitler 114 4
# 2: Hitler 115 1
# 3: John 117 2
# 4: John 118 2
(Hat tip to @thelatemail/@mnel.)
Upvotes: 1
Reputation: 81733
First, calculate mean Appearance
values for each Name
:
meanAp <- with(df, ave(Appearances, Name, FUN = mean))
Second, extract rows:
df[meanAp >= 2, ]
# Scene Name Appearances
# 3 114 Hitler 4
# 5 115 Hitler 1
# 7 117 John 2
# 10 118 John 2
Upvotes: 3