Pedro Guizar
Pedro Guizar

Reputation: 367

How to string replace rows in a data table

I'm working with GSS data, and one of their variables is Total family income. I think that they unnecessarily split up income below 10,000 into 8 groups, so I want to join them all together. I went about doing this with string_replace, but it doesn't seem to recognize the commands.

I run:

GSS2018$`Total family income` <- str_replace(GSS2018$`Total family income`,
                                                 "Under $1 000",
                                                  "Under $10000")
GSS2018$`Total family income` <- str_replace_all(GSS2018$`Total family income`,
                                                 "$1 000 to 2 999",
                                                 "Under $10000")
GSS2018$`Total family income` <- str_replace_all(GSS2018$`Total family income`,
                                                 "$3 000 to 3 999",
                                                 "Under $10000")
GSS2018$`Total family income` <- str_replace_all(GSS2018$`Total family income`,
                                                 "$4 000 to 4 999",
                                                 "Under $10000")
GSS2018$`Total family income` <- str_replace_all(GSS2018$`Total family income`,
                                                 "$5 000 to 5 999",
                                                 "Under $10000")
GSS2018$`Total family income` <- str_replace_all(GSS2018$`Total family income`,
                                                 "$6 000 to 6 999",
                                                 "Under $10000")
GSS2018$`Total family income` <- str_replace_all(GSS2018$`Total family income`,
                                                 "$7 000 to 7 999",
                                                 "Under $10000")
GSS2018$`Total family income` <- str_replace(GSS2018$`Total family income`,
                                                 "$8 000 to 9 999",
                                                 "Under $10000")

However, nothing seems to change after I run these. The strings just remain the same. What am I doing wrong here?

Upvotes: 1

Views: 151

Answers (1)

Daniel V
Daniel V

Reputation: 1386

str_replace uses what are referred to as "regular expressions" (you can look them up for more information). As such, there are a series of characters that are given special meaning aside from the obvious value. One such example of this is the $ symbol which, when used in a regular expression, represents the end of a string. Since the end of a string will never occur at the start of a string, none of these cases will occur.

The solution to this is to use the escape clause which says "no, I really mean $": \\$ instead of $.

The first line would therefore become

GSS2018$`Total family income` <- str_replace(GSS2018$`Total family income`,
                                             "Under \\$1 000",
                                              "Under $10000")

Nevertheless, it appears that your solution can be much simpler. Rather than doing the same thing multiple times, you can do the following:

GSS2018[GSS2018$`Total family income` %in% c("Under $1 000", 
                                             "$1 000 to 2 999", 
                                              ..., 
                                              "$8 000 to 9 999")]$`Total family income` <- "Under $10000

Where ... is replaced with the values you're after.

Upvotes: 3

Related Questions