Reputation: 1217
Say I have a dataframe, df, that looks like this:
timestamp residence
2014/01/29 10:46:46 PM EST Virginia, USA
2014/01/29 10:51:01 PM EST Maryland, USA
2014/01/29 10:54:08 PM EST Massachusetts, USA
2014/01/29 10:55:00 PM EST Indiana, USA
2014/01/29 11:02:31 PM EST Michigan, USA
2014/01/29 11:19:42 PM EST Virginia, USA
Now I want to take this and create a new dataframe, df.count, which contains one column listing every string found under in df$residence uniquely (once) and a second column listing counts for the number of occurrences of each string in df$residence. This is similar to
table(df$residence)
but the output format would instead look like:
residence count
Virginia, USA 2
Maryland, USA 1
Massachusetts, USA 1
Indiana, USA 1
Michigan, USA 1
Upvotes: 0
Views: 105
Reputation: 81693
Another solution with aggregate
:
setNames(aggregate(seq(nrow(df)) ~ residence,df, length), c("residence","count"))
residence count
1 Indiana, USA 1
2 Maryland, USA 1
3 Massachusetts, USA 1
4 Michigan, USA 1
5 Virginia, USA 2
Upvotes: 1
Reputation: 44320
I suppose you could use table
to build this new data frame:
tab <- table(df$residence)
data.frame(residence=names(tab), count=as.vector(tab))
# residence count
# 1 Indiana, USA 1
# 2 Maryland, USA 1
# 3 Massachusetts, USA 1
# 4 Michigan, USA 1
# 5 Virginia, USA 2
Upvotes: 2
Reputation: 52637
If you're okay with residence
as the names only:
with(df, data.frame(count=tapply(residence, residence, length)))
If you want an actual column with residence
:
with(df, {
summ <-tapply(residence, residence, length)
data.frame(residence=names(summ), count=summ)
} )
Upvotes: 1