Reputation: 119
I am trying to follow along in a tutorial on ggplot but the data set I have list dollar values with $ and percent values with % making plotting impossible as it says that it must be numeric.
for example my datasets name is housing and column with the prices of homes is labeled Home.Value the prices are formatted: $24,895 $25,175
How would I go about removing the dollar sign and the percent sign?
Upvotes: 2
Views: 2864
Reputation: 49650
This answer shows a method for removing comas when reading the data into R. It can be modified easily to also remove $, %, and other things as well (just change gsub(",","", from)
to gsub("[,$%]","", from)
).
Upvotes: 0
Reputation: 4230
Suppose you have a data frame like this one:
df<-data.frame(A=c("$5,33","$3,55"),B=c(T,F))
Then you could replace column A with
df$A<-gsub("\\$","",df$A)
You have to use \ or fixed=T for gsub to understand that $ (or %) are what you want to get replaced.
If you want one line for $ and % you can use "OR" opperator (|)
df$A<-gsub("\\$|%","",df$A)
UPDATE:
Maybe you want it that way but take into account that your numbers are formatted with commas and will stay as characters for R. You're probably going to substitute the comma later.
To do that we have to get rid of the commas using the expression "\," (again we must escape the comas with \)
df$A<-as.numeric(gsub("\\,","",df$A))
df
A B
1 533 TRUE
2 355 FALSE
Notice now, A column is numeric
str(df)
'data.frame': 2 obs. of 2 variables:
$ A: num 533 355
$ B: logi TRUE FALSE
Again, you could have done everything with one line but I'm guessing it would be more easy for you in two lines.
Upvotes: 4