Reputation: 931
Line break is always the first target to prevent new line splitted in new row when perform "import text file" in Excel.Or export to other application with csv file importing. (The solution might be able to apply in clean another special mark in dataset. )
dt[,lapply(.SD,gsub("\\n","",.SD))]
R freezed after applying the script with +50 cols & +3 million rows
What's wrong with the lapply approach above?And what is the preferred approach to clean certain things on entire table ?
Upvotes: 1
Views: 397
Reputation: 34703
chinsoon12 is basically it -- use set
for low-overhead by-reference column overwrite; just add fixed=TRUE
to make the regex faster too:
for (jj in seq_len(ncol(dt))) set(dt, , jj, gsub('\n', '', dt[[jj]], fixed = TRUE))
BTW, \\n
is different from \n
. \n
is the literal newline character, \\n
is the string "\n"
, i.e., a backslash followed by n
. You can see the difference thus:
cat('hey\nyou')
# hey
# you
cat('hey\\nyou')
# hey\nyou
Upvotes: 2