Reputation: 151
I have data in R that looks like this:
Cnty Yr Plt Spp DBH Ht Age
1 185 1999 20001 Bitternut 8.0 54 47
2 185 1999 20001 Bitternut 7.2 55 50
3 31 1999 20001 Pignut 7.4 71 60
4 31 1999 20001 Pignut 11.4 85 114
5 189 1999 20001 WO 14.5 80 82
6 189 1999 20001 WO 12.1 72 79
I would like to know the quantity of unique species (Spp) in each county (Cnty). "unique(dfname$Spp)" gives me a total count of unique species in the data frame, but I would like it by county.
Any help is appreciated! Sorry for the weird formatting, this is my first ever question on SO.
Thanks.
Upvotes: 15
Views: 49463
Reputation: 6416
set.seed(1)
mydf <- data.frame(
Cnty = rep(c("185", "31", "189"), times = c(5, 3, 2)),
Yr = c(rep(c("1999", "2000"), times = c(3, 2)),
"1999", "1999", "2000", "2000", "2000"),
Plt = "20001",
Spp = sample(c("Bitternut", "Pignut", "WO"), 10, replace = TRUE),
DBH = runif(10, 0, 15)
)
mydf
The dplyr package could help here:
library(dplyr)
mydf %>%
group_by(Cnty) %>%
summarise(un_Spp = length(unique(Spp)))
#> # A tibble: 3 × 2
#> Cnty un_Spp
#> <chr> <int>
#> 1 185 3
#> 2 189 2
#> 3 31 2
Upvotes: 1
Reputation: 1
We can now use the tally function to make this easier.
tally(group_by(mydf, Spp, Cnty)) Spp Cnty n <fctr> <fctr> <int> 1 Bitternut 185 2 2 Bitternut 189 1 3 Pignut 185 2 4 Pignut 189 1 5 Pignut 31 1 6 WO 185 1 7 WO 31 2
Upvotes: -1
Reputation: 13827
A simple solution using the data.table
approach.
library(data.table)
output <- setDT(mydf)[ , .(count=.N) , by = .(Spp,Cnty)]
in case you want to reshape the output into a nicer table format:
library(tidyr)
spread(data=a, key =Spp, count)
# Cnty Bitternut Pignut WO
# 1: 185 2 2 1
# 2: 189 1 1 NA
# 3: 31 NA 1 2
# or perhaps like this:
spread(data=a, key =Cnty, count)
# Spp 185 189 31
# 1: Bitternut 2 1 NA
# 2: Pignut 2 1 1
# 3: WO 1 NA 2
Upvotes: 0
Reputation: 118
I wanted to add on to what A Handcart And Mohair mentioned. For those of you wanting to get the results of the code below into a data frame(helpful in R studio)...
with(mydf, table(Spp, Cnty))
# Cnty
# Spp 185 189 31
# Bitternut 2 1 0
# Pignut 2 1 1
# WO 1 0 2
ftable(mydf, row.vars="Spp", col.vars=c("Cnty", "Yr"))
# Cnty 185 189 31
# Yr 1999 2000 1999 2000 1999 2000
# Spp
# Bitternut 1 1 0 1 0 0
# Pignut 2 0 0 1 0 1
# WO 0 1 0 0 2 0
You'll need to put the as.data.frame.matrix modifier in front of your code like so:
as.data.frame.matrix(with(mydf, table(Spp, Cnty)))
I was pretty new to R when I came upon this post, and it took me a long time to figure that out, so I thought I'd share.
Upvotes: 0
Reputation: 1
with(mydf, tapply(Spp, list(Cnty, Yr),
FUN = function(x) length(unique(x))))
unique query is not working with large data set i mean data more than 1000k row.
Upvotes: 0
Reputation: 193687
I've tried to make your sample data a little bit more interesting. Your sample data presently has just one unique "Spp" per "Cnty".
set.seed(1)
mydf <- data.frame(
Cnty = rep(c("185", "31", "189"), times = c(5, 3, 2)),
Yr = c(rep(c("1999", "2000"), times = c(3, 2)),
"1999", "1999", "2000", "2000", "2000"),
Plt = "20001",
Spp = sample(c("Bitternut", "Pignut", "WO"), 10, replace = TRUE),
DBH = runif(10, 0, 15)
)
mydf
# Cnty Yr Plt Spp DBH
# 1 185 1999 20001 Bitternut 3.089619
# 2 185 1999 20001 Pignut 2.648351
# 3 185 1999 20001 Pignut 10.305343
# 4 185 2000 20001 WO 5.761556
# 5 185 2000 20001 Bitternut 11.547621
# 6 31 1999 20001 WO 7.465489
# 7 31 1999 20001 WO 10.764278
# 8 31 2000 20001 Pignut 14.878591
# 9 189 2000 20001 Pignut 5.700528
# 10 189 2000 20001 Bitternut 11.661678
Next, as suggested, tapply
is a good candidate here. Combine unique
and length
to get the data you are looking for.
with(mydf, tapply(Spp, Cnty, FUN = function(x) length(unique(x))))
# 185 189 31
# 3 2 2
with(mydf, tapply(Spp, list(Cnty, Yr), FUN = function(x) length(unique(x))))
# 1999 2000
# 185 2 2
# 189 NA 2
# 31 1 1
If you're interested in simple tabulation (not of unique values), then you can explore table
and ftable
:
with(mydf, table(Spp, Cnty))
# Cnty
# Spp 185 189 31
# Bitternut 2 1 0
# Pignut 2 1 1
# WO 1 0 2
ftable(mydf, row.vars="Spp", col.vars=c("Cnty", "Yr"))
# Cnty 185 189 31
# Yr 1999 2000 1999 2000 1999 2000
# Spp
# Bitternut 1 1 0 1 0 0
# Pignut 2 0 0 1 0 1
# WO 0 1 0 0 2 0
Upvotes: 19
Reputation: 366
As Justin mentioned aggregate is probably what you want. If you call your data frame foo, then the following should give you what you want, namely the number of individuals per species assuming that each row with Butternut represents a unique individual belonging to the butternut species. Note I used foo$Age to calculate the length of the vector, i.e. the number of individuals (row) belonging to each species, but you could use foo$Ht or foo$DBH etc.
aggregate(foo$Age, by = foo[c('Spp','Cnty')], length)
Cheers,
Danny
Upvotes: 2