jalapic
jalapic

Reputation: 14222

Replace all instances of numbers in a dataframe with strings R

I am looking at replacing all numbers in a dataframe with words/strings. Each number will be replaced with the exact same word. e.g. all instances of the number 5 should be replaced with 'banana', all instances of the number 10 with 'kiwi', and so on.

Here is a sample dataframe. Rownames and colnames are numbers too:

#    1  2  3  4  5  6
#1   7  7  7  7  7  7
#2   5  5  5  5  5  5
#3   4  4  4  4  4  4
#4   8  8  8  8  8  8
#5   1  1  1  1  1  1
#6   2  2  2  2  2  2
#7   6  6  6  6  3  3
#8   3  3  3  3  6  6
#9  10 10 10 10 10 10
#10 11 11 11 11 11 11
#11 12 12 12 12 12 12
#12  9  9  9  9  9  9

Here is the sample data (mydf) for reproducing this:

mydf<-structure(c(7, 5, 4, 8, 1, 2, 6, 3, 10, 11, 12, 9, 7, 5, 4, 8, 
1, 2, 6, 3, 10, 11, 12, 9, 7, 5, 4, 8, 1, 2, 6, 3, 10, 11, 12, 
9, 7, 5, 4, 8, 1, 2, 6, 3, 10, 11, 12, 9, 7, 5, 4, 8, 1, 2, 3, 
6, 10, 11, 12, 9, 7, 5, 4, 8, 1, 2, 3, 6, 10, 11, 12, 9), .Dim = c(12L, 
6L), .Dimnames = list(c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12"), c("1", "2", "3", "4", "5", "6")))

Here is a dataframe (mydata) I constructed showing which number should be replaced with which word/fruit:

mydata <- data.frame(nums = c(1:12))                     
mydata$fruits<-c("apple", "pear", "orange", "melon", "banana", "grape", "pineapple",      "mango", "lemon", "kiwi", "guava", "peach")

I have tried looking through similarly named threads, but they mainly discuss changing certain parts of dataframes (e.g. specific variables or specific observations), not the contents of the whole dataframe.

I tried using multiple gsub commands, but this doesn't work for multiple reasons. I guess I need to use a function to apply across all variables in the df, but not sure what.

The final result should look something like this:

      1           2           3           4           5           6          
1  "pineapple" "pineapple" "pineapple" "pineapple" "pineapple" "pineapple"
2  "banana"    "banana"    "banana"    "banana"    "banana"    "banana"   
3  "melon"     "melon"     "melon"     "melon"     "melon"     "melon"    
4  "mango"     "mango"     "mango"     "mango"     "mango"     "mango"    
5  "apple"     "apple"     "apple"     "apple"     "apple"     "apple"    
6  "pear"      "pear"      "pear"      "pear"      "pear"      "pear"     
7  "grape"     "grape"     "grape"     "grape"     "orange"    "orange"   
8  "orange"    "orange"    "orange"    "orange"    "grape"     "grape"    
9  "kiwi"      "kiwi"      "kiwi"      "kiwi"      "kiwi"      "kiwi"     
10 "guava"     "guava"     "guava"     "guava"     "guava"     "guava"    
11 "peach"     "peach"     "peach"     "peach"     "peach"     "peach"    
12 "lemon"     "lemon"     "lemon"     "lemon"     "lemon"     "lemon"

Though ideally, the quote marks would not be visible (I'm not sure if this is possible though).

Upvotes: 3

Views: 677

Answers (4)

Rich Scriven
Rich Scriven

Reputation: 99391

replace might work for you here.

> replace(mydf, seq_along(mydf), mydata[[2]][mydf])
#    1           2           3           4           5           6          
# 1  "pineapple" "pineapple" "pineapple" "pineapple" "pineapple" "pineapple"
# 2  "banana"    "banana"    "banana"    "banana"    "banana"    "banana"   
# 3  "melon"     "melon"     "melon"     "melon"     "melon"     "melon"    
# 4  "mango"     "mango"     "mango"     "mango"     "mango"     "mango"    
# 5  "apple"     "apple"     "apple"     "apple"     "apple"     "apple"    
# 6  "pear"      "pear"      "pear"      "pear"      "pear"      "pear"     
# 7  "grape"     "grape"     "grape"     "grape"     "orange"    "orange"   
# 8  "orange"    "orange"    "orange"    "orange"    "grape"     "grape"    
# 9  "kiwi"      "kiwi"      "kiwi"      "kiwi"      "kiwi"      "kiwi"     
# 10 "guava"     "guava"     "guava"     "guava"     "guava"     "guava"    
# 11 "peach"     "peach"     "peach"     "peach"     "peach"     "peach"    
# 12 "lemon"     "lemon"     "lemon"     "lemon"     "lemon"     "lemon"   

And it can be wrapped with as.data.frame to remove quotes if necessary.

Upvotes: 0

Tyler Rinker
Tyler Rinker

Reputation: 110072

Another possible approach:

library(qdapTools)
as.data.frame(apply(mydf, 2, lookup, mydata))

##            1         2         3         4         5         6
## 1  pineapple pineapple pineapple pineapple pineapple pineapple
## 2     banana    banana    banana    banana    banana    banana
## 3      melon     melon     melon     melon     melon     melon
## 4      mango     mango     mango     mango     mango     mango
## 5      apple     apple     apple     apple     apple     apple
## 6       pear      pear      pear      pear      pear      pear
## 7      grape     grape     grape     grape    orange    orange
## 8     orange    orange    orange    orange     grape     grape
## 9       kiwi      kiwi      kiwi      kiwi      kiwi      kiwi
## 10     guava     guava     guava     guava     guava     guava
## 11     peach     peach     peach     peach     peach     peach
## 12     lemon     lemon     lemon     lemon     lemon     lemon

Upvotes: 0

Matthew Lundberg
Matthew Lundberg

Reputation: 42689

As the fruits are in the correct order and are indexed by 1:12, you can use the entries of mydf to index into mydata$fruits:

apply(mydf, 2, function(x) mydata$fruits[x])

If the values are not in the correct order, or do not cover all possible values (have "holes"), you can use a factor to translate:

apply(mydf, 2, function(x) factor(x, levels=mydata$nums, labels=mydata$fruits))

Upvotes: 0

jbaums
jbaums

Reputation: 27408

You can do this with match, which refers to a lookup vector (your mydata), returning the position in that vector of each element of another vector.

mydf[] <- mydata$fruits[match(mydf, mydata$nums)]

If you coerce to a data.frame, quotes aren't visible when you print the object to screen:

as.data.frame(mydf)

#            1         2         3         4         5         6
# 1  pineapple pineapple pineapple pineapple pineapple pineapple
# 2     banana    banana    banana    banana    banana    banana
# 3      melon     melon     melon     melon     melon     melon
# 4      mango     mango     mango     mango     mango     mango
# 5      apple     apple     apple     apple     apple     apple
# 6       pear      pear      pear      pear      pear      pear
# 7      grape     grape     grape     grape    orange    orange
# 8     orange    orange    orange    orange     grape     grape
# 9       kiwi      kiwi      kiwi      kiwi      kiwi      kiwi
# 10     guava     guava     guava     guava     guava     guava
# 11     peach     peach     peach     peach     peach     peach
# 12     lemon     lemon     lemon     lemon     lemon     lemon    

Whether or not you coerce to data.frame, you can supply quote=FALSE to write.table or write.csv to prevent quotes appearing around character strings in the exported file.

Upvotes: 4

Related Questions