Reputation: 83
My issue is that I've managed to, with great assistance from this community, scrape much of the data I desire; however, I have not managed to organize it in any meaningful way. The links I used in source
are a sample of the many, many links I have for this project that are representative of all of them
library(rvest)
library(tidyverse)
#source links
source<-c("http://www.ufcstats.com/fighter-details/f2688492b9a525a3","http://www.ufcstats.com/fighter-details/f1fac969a1d70b08")
fp_e<-map(source, function(career_data){
read_html(career_data)%>%
html_nodes("div ul li")%>%
html_text()%>%
#cleans up the data a bit
str_replace_all(.,"\n\\s+\n\\s+","")%>%
as.data.frame(.)
})
What I want to do with this list is to turn it into a usable dataframe. My original idea was to transpose()
it after as.data.frame()
; however, all it did was put everything in a single row. Additionally, I was unable to index the data frame. This lead me to believe the data frame was not set up how I thought it was. I want to be more specific here but I'm honestly quite confused at this point.
Searching around, I found this question and the answer by neilfws gave me an idea of building the dataframe and inserting the data into it; however, I don't even know where to start. I'm also unsure if it's necessary to do that when it's already set up in a format I like.
This is the first real-world R
application I've tried and I'm really stumbling on how to organize this data. Thank you for any help and suggestions!
Upvotes: 2
Views: 113
Reputation: 389135
You can do some data-cleaning with tidyverse
library :
library(tidyverse)
library(rvest)
map(source, function(career_data){
read_html(career_data) %>%
html_nodes("div ul li")%>%
html_text() %>%
trimws(whitespace = '[\\s\n]') %>%
tibble(data = .) %>%
separate(data, c('Property', 'Value'), sep = ':') %>%
na.omit() %>%
mutate(Value = trimws(Value, whitespace = '[\n\\s]'))
})
This returns :
#[[1]]
# A tibble: 13 x 2
# Property Value
# <chr> <chr>
# 1 Height "5' 6\""
# 2 Weight "135 lbs."
# 3 Reach "68\""
# 4 STANCE "Orthodox"
# 5 DOB "Oct 16, 1981"
# 6 SLpM "3.70"
# 7 Str. Acc. "39%"
# 8 SApM "2.70"
# 9 Str. Def "66%"
#10 TD Avg. "2.28"
#11 TD Acc. "31%"
#12 TD Def. "65%"
#13 Sub. Avg. "0.3"
#[[2]]
# A tibble: 13 x 2
# Property Value
# <chr> <chr>
# 1 Height "6' 2\""
# 2 Weight "170 lbs."
# 3 Reach "74\""
# 4 STANCE "Southpaw"
# 5 DOB "Aug 25, 1991"
# 6 SLpM "2.53"
# 7 Str. Acc. "47%"
# 8 SApM "2.05"
# 9 Str. Def "55%"
#10 TD Avg. "1.39"
#11 TD Acc. "31%"
#12 TD Def. "70%"
#13 Sub. Avg. "0.4"
Upvotes: 2