user6794408
user6794408

Reputation: 13

How do I get statistics of column "RetailSales2014"?

So "RetailSales2014" contains money values. I know I need to remove the commas to perform statistical analysis, but do I also need to remove the leading '$' symbols too? If I do, how would I remove them?

# Load packages

library("XML") library("RCurl")

Specify URL

url <- "https://nrf.com/2015/top100-table"

Download the content of the URL

url_content <- getURL(url)

Parse the HTML/XML content to generate an R structure representing the HTML/XML tree

doc <- htmlParse(url_content)

tables <- readHTMLTable(doc)

Convert the 3rd element of the list to data frame

retailer_df <- data.frame(tables)

attributes(retailer_df)

Rename retailer_df columns

colnames(retailer_df) <- c("Rank","Company","Headquarter","RetailSales2014","USASalesGrowth","WorldwideRetailSales","USAPercentageOfWorldwideSales","Stores2014","Growth")

summary(retailer_df)

Write the retailer data into csv file under the working directory

write.csv(retailer_df, file = "top100retailers2015.csv")

Upvotes: 0

Views: 53

Answers (1)

AidanGawronski
AidanGawronski

Reputation: 2085

retailer_df$RetailSales2014 <- 
    as.numeric(gsub("(\\D)", "", retailer_df$RetailSales2014))

Upvotes: 0

Related Questions