Reputation: 1667
I understand that this is quite a simple question, but I haven't been able to find an answer to this.
I have a data frame which gives you the id of a person and his hobby. Since a person may have many hobbies, the id field may be repeated in multiple rows, each with a different hobby. I have been trying to print out only those rows which have more than one hobbies. I was able to get the frequencies using table.
But how do I apply the condition to print only when the frequency is greater than one.
Secondly, is there a better way to find frequencies without using table.
This is my attempt with table without the filter for frequency greater than one
> id=c(1,2,2,3,2,4,3,1)
> hobby = c('play','swim','play','movies','golf','basketball','playstation','gameboy')
> df = data.frame(id, hobby)
> table(df$id)
1 2 3 4
2 3 2 1
Upvotes: 1
Views: 3445
Reputation: 166
This example assumes you are trying to filter df
id=c(1,2,2,3,2,4,3,1)
hobby = c('play','swim','play','movies','golf','basketball',
'playstation','gameboy')
df = data.frame(id, hobby)
table(df$id)
Get all those ids that have more than one hobby
tmp <- as.data.frame(table(df$id))
tmp <- tmp[tmp$Freq > 1,]
Using that information - select their IDs in df
df1 <- df[df$id %in% tmp$Var1,]
df1
Upvotes: 1
Reputation: 3634
Try using data table, I find it more readable than using table() functions:
library(data.table)
id=c(1,2,2,3,2,4,3,1)
hobby = c('play','swim','play','movies',
'golf','basketball','playstation','gameboy')
df = data.frame(id=id, hobby=hobby)
dt = as.data.table(df)
dt[,hobbies:=.N, by=id]
You will get, for your condition:
> dt[hobbies >1,]
id hobby hobbies
1: 1 play 2
2: 2 swim 3
3: 2 play 3
4: 3 movies 2
5: 2 golf 3
6: 3 playstation 2
7: 1 gameboy 2
Upvotes: 3