Reputation: 99
I am trying to filter my dataframe. Basically my data looks like this:
year value
2010 '-'
2010 '$64'
2011 '-'
2011 '$50'
2012 '-'
2013 '-'
2013 '$87'
And I want to get rid of the '-' values unless it is the only value for one year. So the output would look something like this:
year value
2010 '$64'
2011 '$50'
2012 '-'
2013 '$87'
Any ideas would be greatly appreciated! I'm sure this is a simple solution, but I'm having trouble figuring it out.
Upvotes: 2
Views: 92
Reputation: 39585
A step by step solution with base R
would be:
#Create unique values
dfun <- aggregate(value~year,df,function(x) length(unique(x)))
#Add to original dataframe
df$Var <- dfun[match(df$year,dfun$year),"value"]
#Create a flag
df$Flag=ifelse(df$value=="'-'" & df$Var>1,1,0)
#Filter
df2 <- subset(df,Flag==0)
df2$Var <- NULL
df2$Flag <- NULL
Output:
year value
2 2010 '$64'
4 2011 '$50'
5 2012 '-'
7 2013 '$87'
Some data used:
#Data
df <- structure(list(year = c(2010L, 2010L, 2011L, 2011L, 2012L, 2013L,
2013L), value = c("'-'", "'$64'", "'-'", "'$50'", "'-'", "'-'",
"'$87'"), Var = c(2L, 2L, 2L, 2L, 1L, 2L, 2L), Flag = c(1, 0,
1, 0, 0, 1, 0)), row.names = c(NA, -7L), class = "data.frame")
Upvotes: 1
Reputation: 101034
A base R option with subset
+ ave
subset(
df,
as.logical(
ave(value,
year,
FUN = function(x) length(unique(x)) > 1 & x != "-" | length(unique(x)) == 1
)
)
)
which gives
year value
2 2010 $64
4 2011 $50
5 2012 -
7 2013 $87
Upvotes: 1
Reputation: 886938
Here is an option with filter
after grouping by 'year'. We would check whether the value is not equal to -
and the number of distinct elements are greater than 1 or (|
) if there is a single element (n() == 1
) and that value is '-'
library(dplyr)
df1 %>%
group_by(year) %>%
filter( value != '-' & n_distinct(value) > 1| (value == '-') & n() == 1)
# A tibble: 4 x 2
# Groups: year [4]
# year value
# <int> <chr>
#1 2010 $64
#2 2011 $50
#3 2012 -
#4 2013 $87
Or a slightly more compact option is to get the rows where 'value' is not equal to '-' or (|
) where the number of elements that are -
in 'value' is equal to the number of rows
df1 %>%
group_by(year) %>%
filter( sum(value == '-') == n()| value != '-')
df1 <- structure(list(year = c(2010L, 2010L, 2011L, 2011L, 2012L, 2013L,
2013L), value = c("-", "$64", "-", "$50", "-", "-", "$87")),
class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 3