Reputation: 513
I have this table:
library(rvest)
library(tidyverse)
tables_team_pl <- read_html('https://www.win-or-lose.com/football-team-colours/')
color_table <- tables_team_pl %>% html_table() %>% pluck(1) %>% select(-Away)
and also this one:
table_1 <- structure(list(Team = c("Arsenal", "Aston Villa", "Blackburn",
"Bolton", "Chelsea", "Everton", "Fulham", "Liverpool", "Manchester City",
"Manchester Utd", "Newcastle Utd", "Norwich City", "QPR", "Stoke City",
"Sunderland", "Swansea City", "Tottenham", "West Brom", "Wigan Athletic",
"Wolves")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-20L))
As you can see the second table has its names incomplete. for example, Manchester Utd
should be Manchester United
as in the first table.
So, all I need is to complete this second table extracting the same names from the first table.
So, I will have table_1 corrected: Manchester Utd should change to Manchester Unites, Blackburn should change to Blackburn Rovers and so on. The complete names should come from the first table.
Also on the second table I have QPR, whcih should be "Queens Park Rangers".
Any help?
Upvotes: 2
Views: 87
Reputation: 19088
Here's a base R solution using agrep
. It has the neat feature of allowing to set a maximum number of insertions, deletions and substitutions to achieve a match.
table_1_original <- table_1
table_1$Team <- data.frame( Team=sapply( as.matrix(table_1), function(x){
a=agrep( x, tables_team_pl,
max=list(insert=0,del=0,subs=3));
if(!identical(a, integer(0))){ tables_team_pl[a] }
else{ x } } ) )
Result including a comparison to the original:
cbind(table_1_original, table_1)
Team Team
1 Arsenal Arsenal
2 Aston Villa Aston Villa
3 Blackburn Blackburn Rovers
4 Bolton Bolton
5 Chelsea Chelsea
6 Everton Everton
7 Fulham Fulham
8 Liverpool Liverpool
9 Manchester City Manchester City
10 Manchester Utd Manchester United
11 Newcastle Utd Newcastle United
12 Norwich City Norwich City
13 Queens Queens Park Rangers
14 Stoke City Stoke City
15 Sunderland Sunderland
16 Swansea City Swansea City
17 Tottenham Tottenham Hotspur
18 West Brom West Bromwich Albion
19 Wigan Athletic Wigan Athletic
20 Wolves Wolverhampton Wanderers
filtered HTML data without colors:
tables_team_pl <- c("Aberdeen", "AFC Bournemouth", "AFC Wimbledon", "Arsenal",
"Aston Villa", "Birmingham City", "Blackburn Rovers", "Bradford City",
"Brentford", "Brighton & Hove Albion", "Bristol City", "Burnley",
"Cardiff City", "Celtic", "Chelsea", "Crystal Palace", "Derby County",
"Dundee", "Dundee United", "Everton", "Fulham", "Hamilton Academical",
"Heart of Midlothian", "Hibernian", "Huddersfield Town", "Hull City",
"Inverness Caledonian Thistle", "Kilmarnock", "Leeds United",
"Leicester City", "Liverpool", "Livingston", "Manchester City",
"Manchester United", "Middlesbrough", "Millwall", "Motherwell",
"Newcastle United", "Norwich City", "Nottingham Forest", "Partick Thistle",
"Portsmouth", "Preston North End", "Queens Park Rangers", "Rangers",
"Reading", "Ross County", "Rotherham", "Sheffield United", "Sheffield Wednesday",
"Southampton", "St Johnstone", "St Mirren", "Stoke City", "Sunderland",
"Swansea", "Tottenham Hotspur", "Watford", "West Bromwich Albion",
"West Ham United", "Wolverhampton Wanderers", "Wycombe Wanderers")
Upvotes: 1
Reputation: 887158
We may use a strindist
join
library(fuzzyjoin)
library(dplyr)
stringdist_left_join(table_1, color_table, by = "Team", method = "soundex") %>%
transmute(Team = coalesce(Team.y, Team.x)) %>%
distinct
Upvotes: 2