Marie
Marie

Reputation: 127

How to extract specific rows in R?

I would like to extract specific rows from a dataframe into a new dataframe using R. I have two columns: City and Household. In order to detect move, I want a new dataframe with the households who have not the same city.

For example, if a household appears 3 times with at least one city differents from the others, I keep it. Otherwise, I delete the 3 rows of this household.

    City      Household
   Paris              A
   Paris              A
    Nice              A
  Limoge              B
  Limoge              B
Toulouse              C
   Paris              C

Here, I want to keep only Household A and Household C.

Upvotes: 5

Views: 3501

Answers (2)

David Arenburg
David Arenburg

Reputation: 92300

Base R possible solution

df1[with(df1, ave(as.character(City), Household, FUN=function(x) length(unique(x))) > 1L),]

Or

df1[df1$Household %in% names(which(table(unique(df1)$Household) > 1)),]

Or possible data.table v >= 1.9.5 devel version solution

library(data.table) # v > 1.9.5, otherwise use length(unique(City))
setDT(df1)[, if(uniqueN(City) > 1L) .SD, by = Household]

Or

setDT(df1)[, .SD[uniqueN(City) > 1L], by = Household]

Upvotes: 2

scoa
scoa

Reputation: 19857

A dplyr solution : compute the length of unique cities for each household and keep only those with length > 1

library(dplyr)
df <- data.frame(city=c("Paris","Paris","Nice","Limoge","Limoge","Toulouse","Paris"),
                 household =c(rep("A",3),rep("B",2),rep("C",2)))

new_df <- df %>% group_by(household) %>%
  filter(n_distinct(city) > 1)

Source: local data frame [5 x 2]
Groups: household

      city household
1    Paris         A
2    Paris         A
3     Nice         A
4 Toulouse         C
5    Paris         C

Edit : added @shadow and @davidarenburg suggestions from the comments

Upvotes: 2

Related Questions