Mon
Mon

Reputation: 281

Remove rows from data frame if values added together is less than x

I have the following data frame, call it df, which is a data frame consisting in three vectors: "Scene," "Name," and "Appearances." I would like to total the value for "Appearances" for every instance in which the "Name" is in the list and divide it by the number of times the name appears in the list. Then I want to remove from df all the rows in which that total number (total Appearances, divided by the number of times the name is in the list) is less than 2.

So for example, here in df, everyone's row would be tossed out except John's and Hitler's, whose values, are calculated (2+2)/2=2 and (4+1/2)=2.5

Scene      Name   Appearances 
112       Hamlet         1  
113       Zyklon         1 
114       Hitler         4  
115  Chamberlain         1  
115       Hitler         1  
117       Gospel         1  
117         John         2  
117      Deussen         1  
118        Plato         1 
118         John         2  
118        Hegel         1  
119      Cankara         1  
120        Freud         1  
121        Freud         1  
122  Petersbourg         1 

I have tried a couple things, with some multiplication instead, but they're both mathematically wrong and return errors.

First, I tried to turn df into a two way table, and delete entries belonging to an infrequent name:

removeinfreqs <- function(df){
x <- table(df$Name, df$Appearances)
d<-df[(df$Name %in% names * df$Appearances)/df$Name %in% names(x[x >= 3]), ]
d
}

but I got an error: "Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments"

I tried the same sort of thing with the subset command:

df_less<-subset(df, df$Name %in% names * df$Appearances/df$Name %in% names >= 3)

But I get the same error: "Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments"

I have very little experience working with data frames in r. How can I perform this operation? Any help greatly appreciated.

Upvotes: 1

Views: 1532

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193677

Here's an alternative with "data.table":

library(data.table)
DT <- data.table(df)

DT[, if(mean(Appearances) >= 2) .SD, by = Name]
#      Name Scene Appearances
# 1: Hitler   114           4
# 2: Hitler   115           1
# 3:   John   117           2
# 4:   John   118           2

(Hat tip to @thelatemail/@mnel.)

Upvotes: 1

Sven Hohenstein
Sven Hohenstein

Reputation: 81733

First, calculate mean Appearance values for each Name:

meanAp <- with(df, ave(Appearances, Name, FUN = mean))

Second, extract rows:

df[meanAp >= 2, ]

#    Scene   Name Appearances
# 3    114 Hitler           4
# 5    115 Hitler           1
# 7    117   John           2
# 10   118   John           2

Upvotes: 3

Related Questions