Reputation:
I use separate()
function to split column: Enterdateofexam2, it is character format, the value is like "25.07","13.09","16.06"... My aim is splitting it into day(25) and month(07), then use convert = true to turn them into numeric for filter in next step.
My code is:
jimma3n <- jimma3 %>%
select(Enterdateofexam2, Enterdayofexam, UniqueKey,MEDICALRECORD)%>%
separate(Enterdateofexam2,into=c("day", "month"), sep=".", convert = TRUE)
view (jimma3n)
but the R keeps warning me like:
Expected 2 pieces. Additional pieces discarded in 4088 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
So could anyone help to figure out which part is wrong about my code? Thanks~~!
Upvotes: 1
Views: 11078
Reputation: 79311
The main issue is that you have to define the number of columns you will separate to. If you define 2 columns let's say a
and b
and you have 3 elements to separate: let's say x
y
z
, then z
will be discarded.
With separate
you must define the new columns, which is difficult if you don't know how many columns you will need after separate
Consider this example: in row 3 you have 3 elements:
df <- data.frame(x = c("x", "x y", "x y z", NA))
x
1 x
2 x y
3 x y z
4 <NA>
With this code you define 2 columns to separate to
df %>% separate(x, c("a", "b"))
a b
1 x <NA>
2 x y
3 x y
4 <NA> <NA>
In row 3 z
is discarded because we have define only 2 columns a
and b
If we define 3 columns like
df %>% separate(x, c("a", "b", "c"))
The warning for discarding will disappear.
On the other hand you will get a warning for x with fewer then 3 elements, those will be filled with NA.
Upvotes: 1
Reputation: 887951
We can use extra
argument. Also, by default, the sep
is in regex
mode - according to ?separate
documentation
sep - If character, sep is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values.
and .
is a metacharacter which can match any character. Therefore, we may need to either escape (\\.
) or place it in square brackets ([.]
). Also, based on the dput
, the column is a list
, which should be unnest
ed first before doing the separate
library(dplyr)
library(tidyr)
jimma3 %>%
select(Enterdateofexam2, Enterdayofexam, UniqueKey,MEDICALRECORD)%>%
unnest(Enterdateofexam2) %>%
separate(Enterdateofexam2,into=c("day", "month"),
sep="\\.", convert = TRUE, extra = "merge") %>%
na.omit
-output
# A tibble: 6 x 5
day month Enterdayofexam UniqueKey MEDICALRECORD
<int> <int> <chr> <chr> <chr>
1 7 6 1 530 577207
2 8 6 2 530 577207
3 9 6 3 530 577207
4 2 12 1 531 575333
5 3 12 2 531 575333
6 4 12 3 531 575333
Basically, with sep = "."
, it is splitting at every character element and thus the warning popped up
jimma3 <- structure(list(Enterdateofexam2 = list(c("", "7.06"), c("", "8.06"
), c("", "9.06"), c("", "2.12"), c("", "3.12"), c("", "4.12")),
Enterdayofexam = c("1", "2", "3", "1", "2", "3"), UniqueKey = c("530",
"530", "530", "531", "531", "531"), MEDICALRECORD = c("577207",
"577207", "577207", "575333", "575333", "575333")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 5