Reputation: 185
I have a database with the following structure. I want to separate the last column, a part number and the other with the description, separating by "-". In the example it works ok. But when I use this in my database (50,700 cases) I get an error message.
rut<-c("50001780", "50001810", "50001820",
"50001850", "50001890", "50001940")
econ.activ<-c("552010 - RESTAURANTES",
"930990 - OTRAS ACTIVIDADES DE SERVICIOS PERSONALES N.C.P.",
"523911 - COMERCIO AL POR MENOR DE ARTICULOS FOTOGRAFICOS",
"930990 - OTRAS ACTIVIDADES DE SERVICIOS PERSONALES N.C.P.",
"930990 - OTRAS ACTIVIDADES DE SERVICIOS PERSONALES N.C.P.",
"930990 - OTRAS ACTIVIDADES DE SERVICIOS PERSONALES N.C.P.")
df<-data.frame(rut,econ.activ)
df %>% separate(col = "econ.activ", into = c("folio", "descripción"), sep = "\\-")
This is the error message
Warning message: Too many values at 174 locations: 3645, 4401, 5118, 10013, 10018, 10023, 10905, 10907, 10921, 10923, 10928, 10930, 10935, 10937, 10942, 10944, 11586, 13556, 13557, 13614, ...
Thanks in advance.
Upvotes: 1
Views: 117
Reputation: 886938
There could be more than one -
in some rows and that may be the reason for the warning. One option is cSplit
from splitstackshape
. It will split the 'econ.activ' into 'n' number of columns based on the maximum number (i.e. 'n') of '-' in a particular row. By default, it will pad NA to columns if there are less number of '-' in certain rows.
library(splitstackshape)
cSplit(df, "econ.activ", sep="-")
If we need to reproduce the warning, just insert another -
in one of the elements
df$econ.activ <- as.character(df$econ.activ)
df$econ.activ[3] <- "930990 - OTRAS - SEP"
df %>%
separate(col = "econ.activ", into = c("folio", "descripción"), sep = "\\-")
---
---
Warning message: Too many values at 1 locations: 3
It is because the separate
have an argument extra
which is by default warn
. There are other options like drop
or merge
. If we use the drop
it remove the substring from the second -
onwards
df %>%
separate(col = "econ.activ", into = c("folio", "descripción"), sep = "\\-", extra="drop")
# rut folio descripción
#1 50001780 552010 RESTAURANTES
#2 50001810 930990 OTRAS ACTIVIDADES DE SERVICIOS PERSONALES N.C.P.
#3 50001820 930990 OTRAS
#4 50001850 930990 OTRAS ACTIVIDADES DE SERVICIOS PERSONALES N.C.P.
#5 50001890 930990 OTRAS ACTIVIDADES DE SERVICIOS PERSONALES N.C.P.
#6 50001940 930990 OTRAS ACTIVIDADES DE SERVICIOS PERSONALES N.C.P.
and with extra = "merge"
, it retains that substring in the "descripción" column. So, basically, it depends on what the OP wants as output
df %>%
separate(col = "econ.activ", into = c("folio", "descripción"),
sep = "\\-", extra="merge")
# rut folio descripción
#1 50001780 552010 RESTAURANTES
#2 50001810 930990 OTRAS ACTIVIDADES DE SERVICIOS PERSONALES N.C.P.
#3 50001820 930990 OTRAS - SEP
#4 50001850 930990 OTRAS ACTIVIDADES DE SERVICIOS PERSONALES N.C.P.
#5 50001890 930990 OTRAS ACTIVIDADES DE SERVICIOS PERSONALES N.C.P.
#6 50001940 930990 OTRAS ACTIVIDADES DE SERVICIOS PERSONALES N.C.P.
Upvotes: 1