user124123
user124123

Reputation: 1683

Aggregate in R keeping only a single value

I have a data frame I would like to aggregate, removing the rows not NA (or choosing unique rows) in the column I would like use to aggregate

i.e in the following I may want to remove every row in the data frame for which week has NA, and keep the others unmodified:

  OTHER_REV        month quarter year       week       date       days daysinmonth
1          2785013     1 2009 Q1 2009 2009-01-05 2009-01-05 2009-01-05          31
2          2785013     1 2009 Q1 2009 2009-01-12 2009-01-05 2009-01-05          31
3          2785013     1 2009 Q1 2009 2009-01-19 2009-01-05 2009-01-05          31
4          2785013     1 2009 Q1 2009 2009-01-26 2009-01-05 2009-01-05          31
5          2785013     1  NA QNA 2009       <NA> 2009-01-16 2009-01-16          31
6          2785013     1  NA QNA 2009       <NA> 2009-01-17 2009-01-17          31

Producing:

  OTHER_REV        month quarter year       week       date       days daysinmonth
1          2785013     1 2009 Q1 2009 2009-01-05 2009-01-05 2009-01-05          31
2          2785013     1 2009 Q1 2009 2009-01-12 2009-01-05 2009-01-05          31
3          2785013     1 2009 Q1 2009 2009-01-19 2009-01-05 2009-01-05          31
4          2785013     1 2009 Q1 2009 2009-01-26 2009-01-05 2009-01-05          31

I have tried using a combo of grep and unique(data$stuff), and using aggregate but neither of these approaches seemed to work.

The following is str of the data:

'data.frame':   1896 obs. of  34 variables:
 $ OTHER_REV      : num  2785013 2785013 2785013 2785013 2785013 ...
 $ month                 : num  1 1 1 1 1 1 1 1 1 1 ...
 $ quarter               :Class 'yearqtr'  num [1:1896] 2009 2009 2009 2009 NA ...
 $ year                  : num  2009 2009 2009 2009 2009 ...
 $ week                  : Date, format: "2009-01-05" "2009-01-12" "2009-01-19" "2009-01-26" ...
 $ date                  : Date, format: "2009-01-05" "2009-01-05" "2009-01-05" "2009-01-05" ...
 $ days                  : Date, format: "2009-01-05" "2009-01-05" "2009-01-05" "2009-01-05" ...
 $ daysinmonth           : int  31 31 31 31 31 31 31 31 31 31 ...

Calling unique on df$week produces:

[1] "2009-01-05" "2009-01-12" "2009-01-19" "2009-01-26" NA           "2009-02-02"......

Upvotes: 0

Views: 164

Answers (1)

ctbrown
ctbrown

Reputation: 2361

Try this:

data[ ! is.na(data$week), ]

The similar answer using a data.table is a little simpler:

data[ ! is.na(week) ] 

Upvotes: 6

Related Questions