Reputation: 35
Dataset can be downloaded from here
library(dplyr)
NBA <- read.csv("NBA Season Dataset/Seasons_Stats.csv")
NBA$Player <- as.character(NBA$Player)
PlayerData <- read.csv("NBA Season Dataset/player_data.csv")
PlayerData$name <- as.character(PlayerData$name)
I want to take their height & weight from PlayerData
then combine with the main data NBA
. The problem is this NBA players dataset contains some players share the same name with other players, so I need to make their names distinct before I merge two data frame together with merge
by player's name.
PlayerData[duplicated(PlayerData$name), "name"]
give me 50 duplicated names.
So I created a function which will rename player in both data frame based on the years they active:
unduplicate <- function(name, year_start, year_end, new_name) {
PlayerData[PlayerData$name == name & PlayerData$year_start == year_start & PlayerData$year_end == year_end, 1] = new_name
NBA[NBA$Player == name & NBA$Year <= year_end & NBA$Year >= year_start, "Player"] = new_name
}
For an example, I take these two players who share the same name.
And then call the function:
unduplicate("Dee Brown", 1991, 2002, "Dee Brown 1")
unduplicate("Dee Brown", 2007, 2009, "Dee Brown 2")
Nothing changed...
But if I do manually like this:
PlayerData[PlayerData$name == "Dee Brown" & PlayerData$year_start == 1991 & PlayerData$year_end == 2002, 1] = "Dee Brown 1"
NBA[NBA$Player == "Dee Brown" & NBA$Year <= 2002 & NBA$Year >= 1991, "Player"] = "Dee Brown 1"
PlayerData[PlayerData$name == "Dee Brown" & PlayerData$year_start == 2007 & PlayerData$year_end == 2009, 1] = "Dee Brown 2"
NBA[NBA$Player == "Dee Brown" & NBA$Year <= 2009 & NBA$Year >= 2007, "Player"] = "Dee Brown 2"
Then come to the desired result:
So my question is,
1) What's wrong with the function? I checked and tried many variations didn't work.
2) Is there any way better to solve this problem?
I'm kinda new to this, so forgive me if this just stupid beginner's mistake.
Thanks!
Upvotes: 0
Views: 45
Reputation: 658
You can use distinct from dplyr to select unique players based on set of variables. Sqldf library gives possibility to merge tables based on condition with inequality:
library(dplyr)
player_data <- read.csv("player_data.csv", stringsAsFactors = F)
Players <- read.csv("Players.csv", stringsAsFactors = F)
NBA1<- read.csv("Seasons_Stats.csv", stringsAsFactors = F)
Dist_players <-player_data%>%
distinct(name, year_start, year_end, height, weight )
library(sqldf)
Final <- sqldf("SELECT * FROM NBA1 JOIN Dist_players ON NBA1.Player = Dist_players.name
WHERE NBA1.Year >= Dist_players.year_start AND NBA1.Year <= Dist_players.year_end")
Upvotes: 1