Reputation: 13
So "RetailSales2014" contains money values. I know I need to remove the commas to perform statistical analysis, but do I also need to remove the leading '$' symbols too? If I do, how would I remove them?
# Load packages
library("XML") library("RCurl")
url <- "https://nrf.com/2015/top100-table"
url_content <- getURL(url)
doc <- htmlParse(url_content)
tables <- readHTMLTable(doc)
retailer_df <- data.frame(tables)
attributes(retailer_df)
colnames(retailer_df) <- c("Rank","Company","Headquarter","RetailSales2014","USASalesGrowth","WorldwideRetailSales","USAPercentageOfWorldwideSales","Stores2014","Growth")
summary(retailer_df)
write.csv(retailer_df, file = "top100retailers2015.csv")
Upvotes: 0
Views: 53
Reputation: 2085
retailer_df$RetailSales2014 <-
as.numeric(gsub("(\\D)", "", retailer_df$RetailSales2014))
Upvotes: 0