Reputation: 13969
I have a dataframe with a many character columns in it. The columns contains empty textstrings and textstrings. I would like to replace all the empty textstrings inside the dataframe with a 0 and the elements containing strings with number 1. Can't figure out how to do it though....
simple example to illustrate :
> df
A B C
1: asdad
2: sd
3: as sd sd
4: daasd sd
5: sd
6: sd
7: ds sds
8: asd
9: sd sd
> str(df)
Classes ‘data.table’ and 'data.frame': 9 obs. of 3 variables:
$ A: chr "asdad" "" "as" "daasd" ...
$ B: chr "" "" "sd" "sd" ...
$ C: chr "" "sd" "sd" "" ...
- attr(*, ".internal.selfref")=<externalptr>
wanted :
> df
A B C
1: 1 0 0
2: 0 0 1
3: 1 1 1
4: 1 1 0
5: 0 1 0
6: 0 0 1
7: 1 1 0
8: 1 0 0
9: 0 1 1
str(df)
Classes ‘data.table’ and 'data.frame': 9 obs. of 3 variables:
$ A: int 1 0 1 1 0 0 1 1 0
$ B: int 0 0 1 1 1 0 1 0 1
$ C: int 0 1 1 0 0 1 0 0 1
- attr(*, ".internal.selfref")=<externalptr>
Upvotes: 1
Views: 613
Reputation: 92292
Here's a simple vectorized solution
(df != "") + 0
# A B C
# 1: 1 0 0
# 2: 0 0 1
# 3: 1 1 1
# 4: 1 1 0
# 5: 0 1 0
# 6: 0 0 1
# 7: 0 1 1
# 8: 1 0 0
# 9: 0 1 1
If you have a data.table
object add as.data.table
as in
as.data.table((df != "") + 0)
Some explanations
When you do df != ""
, R is basically comparing each value in df
to ""
(blank) and returning a logical matrix with TRUE
and FALSE
indicating for each value if it is equals to ""
or not. when added + 0
, the logical values are converted to 1
and 0
.
Edit:
If you have a data.table object and you want to update it by reference you could do
df[, names(df) := lapply(.SD, function(x) (!x %in% c("", " ")) + 0)]
Upvotes: 2