Replace character vector elements in dataframe with values in R

Question

I have a dataframe with a many character columns in it. The columns contains empty textstrings and textstrings. I would like to replace all the empty textstrings inside the dataframe with a 0 and the elements containing strings with number 1. Can't figure out how to do it though....

simple example to illustrate :

> df
       A   B  C
1: asdad       
2:           sd
3:    as  sd sd
4: daasd  sd   
5:        sd   
6:           sd
7:    ds sds   
8:   asd       
9:        sd sd

> str(df)
Classes ‘data.table’ and 'data.frame':  9 obs. of  3 variables:
 $ A: chr  "asdad" "" "as" "daasd" ...
 $ B: chr  "" "" "sd" "sd" ...
 $ C: chr  "" "sd" "sd" "" ...
 - attr(*, ".internal.selfref")=

wanted :

> df
   A B C
1: 1 0 0
2: 0 0 1
3: 1 1 1
4: 1 1 0
5: 0 1 0
6: 0 0 1
7: 1 1 0
8: 1 0 0
9: 0 1 1

str(df)
Classes ‘data.table’ and 'data.frame':  9 obs. of  3 variables:
 $ A: int  1 0 1 1 0 0 1 1 0
 $ B: int  0 0 1 1 1 0 1 0 1
 $ C: int  0 1 1 0 0 1 0 0 1
 - attr(*, ".internal.selfref")=

David Arenburg · Accepted Answer

Here's a simple vectorized solution

(df != "") + 0
#    A B C
# 1: 1 0 0
# 2: 0 0 1
# 3: 1 1 1
# 4: 1 1 0
# 5: 0 1 0
# 6: 0 0 1
# 7: 0 1 1
# 8: 1 0 0
# 9: 0 1 1

If you have a data.table object add as.data.table as in

as.data.table((df != "") + 0)

Some explanations

When you do df != "", R is basically comparing each value in df to "" (blank) and returning a logical matrix with TRUE and FALSE indicating for each value if it is equals to "" or not. when added + 0, the logical values are converted to 1 and 0.

Edit:

If you have a data.table object and you want to update it by reference you could do

df[, names(df) := lapply(.SD, function(x) (!x %in% c("", " ")) + 0)]

Replace character vector elements in dataframe with values in R

Answers (1)

Related Questions