Fendi
Fendi

Reputation: 83

Delete rows that have incomplete value in other column in R

I wanted to delete rows in x1 column that don't appear in EVERY month in another column: The dataset is as follows:

id month  
1 01  
2 01  
3 01  
1 02  
2 02  
1 03  
2 03  

I want to delete id = 3 from the dataset, since it doesn't appear in month = 02

Im using R
Thank you for helping

Upvotes: 1

Views: 100

Answers (2)

akrun
akrun

Reputation: 886938

Using dplyr

library(dplyr)
df %>% 
  group_by(id) %>%
  filter(n_distinct(month) == n_distinct(df$month)) %>% 
  ungroup

-output

# A tibble: 6 × 2
     id month
  <int> <int>
1     1     1
2     2     1
3     1     2
4     2     2
5     1     3
6     2     3

Or using data.table

library(data.table)
data_hh[, if(uniqueN(month) == uniqueN(.SD$month)) .SD, .(id)]

data

data_hh <- structure(list(id = c(18354L, 18815L, 19014L, 63960L, 72996L, 
73930L), month = c(1, 1, 1, 1, 1, 1), value = c(113.33, 251.19, 
160.15, 278.8, 254.39, 733.22), x1 = c(96.75, 186.78, 106.02, 
195.23, 184.57, 473.92), x2 = c(1799.1, 5399.1, 1799.1, 1349.1, 
2924.1, 2024.1), x3 = c(85.37, 74.36, 66.2, 70.02, 72.55, 64.63
), x4 = c(6.29, 4.65, 8.9, 20.66, 8.69, 36.22)), row.names = c(NA, 
-6L), class = c("data.table", "data.frame"))

Upvotes: 0

Sotos
Sotos

Reputation: 51582

You can split the dataset and use Reduce, i.e.

remove <- Reduce(setdiff, split(df$id, df$month))
df[!df$id %in% remove,]

  id month
1  1     1
2  2     1
4  1     2
5  2     2
6  1     3
7  2     3

As @jay.sf mentioned, you need to assign it back to your dataframe,

df <- df[!df$id %in% remove,]

Upvotes: 2

Related Questions