Reputation: 154
I have a data frame in R that has information about NBA players, including salary information. All the data in the salary column have a "$" before the value and I want to convert the character data to numeric for the purpose of analysis. So I need to remove the "$" in this column. However, I am unable to subset or parse any of the values in this column. It seems that each value is a vector of 1. I've included below the structure of the data and what I have tried in my attempt at removing the "$".
> str(combined)
'data.frame': 588 obs. of 9 variables:
$ Player: chr "Aaron Brooks" "Aaron Gordon" "Aaron Gray" "Aaron Harrison" ...
$ Tm : Factor w/ 30 levels "ATL","BOS","BRK",..: 4 22 9 5 9 18 1 5 25 30 ...
$ Pos : Factor w/ 5 levels "C","PF","PG",..: 3 2 NA 5 NA 2 1 1 4 5 ...
$ Age : num 31 20 NA 21 NA 24 29 31 25 33 ...
$ G : num 69 78 NA 21 NA 52 82 47 82 13 ...
$ MP : num 1108 1863 NA 93 NA ...
$ PER : num 11.8 17 NA 4.3 NA 5.6 19.4 18.2 12.7 9.2 ...
$ WS : num 0.9 5.4 NA 0 NA -0.5 9.4 2.8 4 0.3 ...
$ Salary: chr "$2000000" "$4171680" "$452059" "$525093" ...
combined[, "Salary"] <- gsub("$", "", combined[, "Salary"])
The last line of code above is able to run successfully but it doesn't augment the "Salary" column.
I am able to successfully augment it by running the code listed below, but I need to find a way to automize the replacement process for the whole data set instead of doing it row by row.
combined[, "Salary"] <- gsub("$2000000", "2000000", combined[, "Salary"])
How can I subset the character vectors in this column to remove the "$"? Apologies for any formatting faux pas ahead of time, this is my first time asking a question. Cheers,
Upvotes: 1
Views: 668
Reputation: 887431
The $
is a metacharacter which means the end of the string. So, we need to either escape (\\$
) or place it in square brackets ("[$]"
) or use fixed = TRUE
in the sub
. We don't need gsub
as there seems to be only a single $
character in each string.
combined[, "Salary"] <- as.numeric(sub("$", "", combined[, "Salary"], fixed=TRUE))
Or as @gung mentioned in the comments, using substr
would be faster
as.numeric(substr(d$Salary, 2, nchar(d$Salary)))
Upvotes: 2