Amir
Amir

Reputation: 59

Can't modify the contents of specific columns with binary values

I have 10 columns in the data.table DataDia.

> head(DataDia[,c(7:16)])
    Type soin 01  Type soin 02 Type soin 03 Type soin 04 Type soin 05 Type soin 06 Type soin 07          Type soin 08              Type soin 09 Type soin 10
1: crme de jour                                   sérum                                        démaquillant à rincer                                       
2:                                                             masque                           démaquillant à rincer                                       
3:               crme de nuit                     sérum                    lotion                                                                          
4:                                                 sérum                    lotion  eau florale                                                             
5: crme de jour                                   sérum                                                              démaquillant sans rinage             
6:               crme de nuit        huile        sérum     

I just want to apply a general function that modify the contains just only for these columns to binary values. If the columns have empty cells then it will be replaced by 0 else by 1. So I write these code:

DataDia[,DataDia[,c(5:10)]:=lapply(colnames(DataDia[,c(5:10)]), function(x) {if (DataDia[,x]==""){0} else {1}})]                 

But I get this error:

Error in [.data.table(DataDia, , :=(DataDia[, c(7:16)], lapply(colnames(DataDia[, : LHS of := must be a symbol, or an atomic vector (column names or positions).

Note that I want to work with data.table operations. But I don't know why this doesn't work here?

Thank you in advance!

Upvotes: 0

Views: 59

Answers (1)

Charlotte R
Charlotte R

Reputation: 146

First, a vocabulary point : your cells with "" are not empty cells, but cells containing an empty character string with is in itself a value. "Empty cells" refer to missing values, which appear as NA in a table.

Usually, missing data should already be identified as such when loading the data in R (e.g. by the na.strings = argument in the read.table function). If you tell me how you loaded your data, I could help you on how to do this.

As for your code, I would go for something much simpler:

DataDia[,5:10] <- data.table(0+ !(DataDia[,5:10] == ""))

NB: The 0 + part is used here to obtain a numric value of 0 for FALSE and 1 for TRUE. The exclamation mark is used to test the contrary of the written condition (we want it to return FALSE or 0 when the cell is ""). You need the data.table function because matrices do not seem to coerce correctly to data.table.

Here is the code working on a sample dataset:

> DataDia
    Produit1 Produit2 Produit3 Produit4
 1:                 b        c        d
 2:        a        b        c         
 3:        a                 c        d
 4:        a        b                 d
 5:        a                 c        d
 6:        a        b        c        d
 7:                 b        c        d
 8:        a        b        c        d
 9:        a        b                 d
10:        a        b        c        d

> DataDia[,2:3] <- data.table(0+ !(DataDia[,2:3] == ""))
> DataDia
    Produit1 Produit2 Produit3 Produit4
 1:                 1        1        d
 2:        a        1        1         
 3:        a        0        1        d
 4:        a        1        0        d
 5:        a        0        1        d
 6:        a        1        1        d
 7:                 1        1        d
 8:        a        1        1        d
 9:        a        1        0        d
10:        a        1        1        d

Upvotes: 3

Related Questions