Anna Rouw
Anna Rouw

Reputation: 99

Filter a dataframe based on a condition of another column

I am trying to filter my dataframe. Basically my data looks like this:

year   value
2010    '-'
2010   '$64'
2011    '-'
2011   '$50'
2012    '-'
2013    '-'
2013   '$87'

And I want to get rid of the '-' values unless it is the only value for one year. So the output would look something like this:

year    value
2010    '$64'
2011    '$50'
2012     '-'
2013    '$87'

Any ideas would be greatly appreciated! I'm sure this is a simple solution, but I'm having trouble figuring it out.

Upvotes: 2

Views: 92

Answers (3)

Duck
Duck

Reputation: 39585

A step by step solution with base Rwould be:

#Create unique values
dfun <- aggregate(value~year,df,function(x) length(unique(x)))
#Add to original dataframe
df$Var <- dfun[match(df$year,dfun$year),"value"]
#Create a flag
df$Flag=ifelse(df$value=="'-'" & df$Var>1,1,0)
#Filter
df2 <- subset(df,Flag==0)
df2$Var <- NULL
df2$Flag <- NULL

Output:

  year value
2 2010 '$64'
4 2011 '$50'
5 2012   '-'
7 2013 '$87'

Some data used:

#Data
df <- structure(list(year = c(2010L, 2010L, 2011L, 2011L, 2012L, 2013L, 
2013L), value = c("'-'", "'$64'", "'-'", "'$50'", "'-'", "'-'", 
"'$87'"), Var = c(2L, 2L, 2L, 2L, 1L, 2L, 2L), Flag = c(1, 0, 
1, 0, 0, 1, 0)), row.names = c(NA, -7L), class = "data.frame")

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101034

A base R option with subset + ave

subset(
  df,
  as.logical(
    ave(value,
      year,
      FUN = function(x) length(unique(x)) > 1 & x != "-" | length(unique(x)) == 1
    )
  )
)

which gives

  year value
2 2010   $64
4 2011   $50
5 2012     -
7 2013   $87

Upvotes: 1

akrun
akrun

Reputation: 886938

Here is an option with filter after grouping by 'year'. We would check whether the value is not equal to - and the number of distinct elements are greater than 1 or (|) if there is a single element (n() == 1) and that value is '-'

library(dplyr)
df1 %>%
   group_by(year) %>%
   filter( value != '-' & n_distinct(value) > 1| (value == '-') & n() == 1)
# A tibble: 4 x 2
# Groups:   year [4]
#   year value
#  <int> <chr>
#1  2010 $64  
#2  2011 $50  
#3  2012 -    
#4  2013 $87  

Or a slightly more compact option is to get the rows where 'value' is not equal to '-' or (|) where the number of elements that are - in 'value' is equal to the number of rows

df1 %>% 
  group_by(year) %>%
  filter( sum(value == '-') == n()| value != '-')

data

df1 <- structure(list(year = c(2010L, 2010L, 2011L, 2011L, 2012L, 2013L, 
2013L), value = c("-", "$64", "-", "$50", "-", "-", "$87")),
class = "data.frame", row.names = c(NA, 
-7L))

Upvotes: 3

Related Questions