Reputation: 367
I'm working with GSS data, and one of their variables is Total family income. I think that they unnecessarily split up income below 10,000 into 8 groups, so I want to join them all together. I went about doing this with string_replace
, but it doesn't seem to recognize the commands.
I run:
GSS2018$`Total family income` <- str_replace(GSS2018$`Total family income`,
"Under $1 000",
"Under $10000")
GSS2018$`Total family income` <- str_replace_all(GSS2018$`Total family income`,
"$1 000 to 2 999",
"Under $10000")
GSS2018$`Total family income` <- str_replace_all(GSS2018$`Total family income`,
"$3 000 to 3 999",
"Under $10000")
GSS2018$`Total family income` <- str_replace_all(GSS2018$`Total family income`,
"$4 000 to 4 999",
"Under $10000")
GSS2018$`Total family income` <- str_replace_all(GSS2018$`Total family income`,
"$5 000 to 5 999",
"Under $10000")
GSS2018$`Total family income` <- str_replace_all(GSS2018$`Total family income`,
"$6 000 to 6 999",
"Under $10000")
GSS2018$`Total family income` <- str_replace_all(GSS2018$`Total family income`,
"$7 000 to 7 999",
"Under $10000")
GSS2018$`Total family income` <- str_replace(GSS2018$`Total family income`,
"$8 000 to 9 999",
"Under $10000")
However, nothing seems to change after I run these. The strings just remain the same. What am I doing wrong here?
Upvotes: 1
Views: 151
Reputation: 1386
str_replace
uses what are referred to as "regular expressions" (you can look them up for more information). As such, there are a series of characters that are given special meaning aside from the obvious value. One such example of this is the $
symbol which, when used in a regular expression, represents the end of a string. Since the end of a string will never occur at the start of a string, none of these cases will occur.
The solution to this is to use the escape clause which says "no, I really mean $": \\$
instead of $
.
The first line would therefore become
GSS2018$`Total family income` <- str_replace(GSS2018$`Total family income`,
"Under \\$1 000",
"Under $10000")
Nevertheless, it appears that your solution can be much simpler. Rather than doing the same thing multiple times, you can do the following:
GSS2018[GSS2018$`Total family income` %in% c("Under $1 000",
"$1 000 to 2 999",
...,
"$8 000 to 9 999")]$`Total family income` <- "Under $10000
Where ...
is replaced with the values you're after.
Upvotes: 3