Reputation: 4850
How do I get a dataframe like this:
soccer_player country position
"sam" USA left defender
"jon" USA right defender
"sam" USA left midfielder
"jon" USA offender
"bob" England goalie
"julie" England central midfielder
"jane" England goalie
To look like this (country with the counts of unique players per country):
country player_count
USA 2
England 3
The obvious complication is that there are multiple observations per player, so I cannot simply do table(df$country)
to get the number of observations per country.
I have been playing with the table()
and merge()
functions but have not had any luck.
Upvotes: 11
Views: 16464
Reputation: 70336
One more base R option, using aggregate
:
> aggregate(soccer_player ~ country, dd, FUN = function(x) length(unique(x)))
# country soccer_player
#1 England 3
#2 USA 2
Upvotes: 1
Reputation: 270438
Here is an sqldf solution:
library(sqldf)
sqldf("select country, count(distinct soccer_player) player_count
from df
group by country")
## country player_count
## 1 England 2
## 2 USA 2
and here is a base R solution:
as.data.frame(xtabs(~ country, unique(df[1:2])), responseName = "player_count")
## country player_count
## 1 England 2
## 2 USA 2
Upvotes: 3
Reputation: 227001
The new features of dplyr v 3.0 provide a compact solution:
Data:
dd <- read.csv(text='
soccer_player,country,position
"sam",USA,left defender
"jon",USA,right defender
"sam",USA,left midfielder
"jon",USA,offender
"bob",England,goalie
"julie",England,central midfielder
"jane",England,goalie')
Code:
library(dplyr)
dd %>% distinct(soccer_player,country) %>%
count(country)
Upvotes: 7
Reputation: 44634
Here's one way:
as.data.frame(table(unique(d[-3])$country))
# Var1 Freq
# 1 England 3
# 2 USA 2
Drop the third column, remove any duplicate Country-Name pairs, then count the occurrences of each country.
Upvotes: 7
Reputation: 17432
Without using any packages you can do:
List = by(df, df$country, function(x) length(unique(x$soccer_player)))
DataFrame = do.call(rbind, lapply(names(List), function(x)
data.frame(country=x, player_count=List[[x]])))
# country player_count
#1 England 2
#2 USA 2
It's easier with something like data.table
:
dt = data.table(df)
dt[,list(player_count = length(unique(soccer_player))),by=country]
Upvotes: 6