Reputation:
I have been using the function separate() from the library(tidyverse) to separate values into different columns:
45 (10, 89)
34
and with the code:
dd %>% separate(a, c("x","y","z"), extra="drop")
I got what I wanted:
45 10 89
34
But now, my variable has a different format and is not working:
45% (10,89)
34%
Why is not working when using the symbol '%'?
******Edited: Ok, I know why is not working, it is because decimal symbol in my data:
4.5% (10/89)
3.4%
6.7%
7.8% (89/98)
How do you deal with decimals with the separate function? Thank you very much!!
Thank you!
Upvotes: 1
Views: 804
Reputation: 160607
I'm inferring that when you say "is not working", it's because the percent sign is being removed:
separate(data_frame(a=c("45 (10, 89)","34")), a, c('x','y','z'), extra="drop")
# Warning: Too few values at 1 locations: 2
# # A tibble: 2 × 3
# x y z
# * <chr> <chr> <chr>
# 1 45 10 89
# 2 34 <NA> <NA>
separate(data_frame(a=c("45% (10, 89)","34%")), a, c('x','y','z'), extra="drop")
# Warning: Too few values at 1 locations: 2
# # A tibble: 2 × 3
# x y z
# * <chr> <chr> <chr>
# 1 45 10 89
# 2 34 <NA>
From ?separate
:
separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...) ...
Since you are not overriding the default of sep
, it finds anything that is not a letter or a number. FYI, [^[:alnum:]]+
is analogous to [^A-Za-z0-9]+
, which matches "1 or more characters that are not in the character-ranges of A-Z, a-z, or 0-9".
Simply provide a more-detailed sep
, and you'll get what you want.
separate(data_frame(a=c("45% (10, 89)","34%")), a, c('x','y','z'), sep="[^[:alnum:]%]+", extra="drop")
# Warning: Too few values at 1 locations: 2
# # A tibble: 2 × 3
# x y z
# * <chr> <chr> <chr>
# 1 45% 10 89
# 2 34% <NA> <NA>
Edit: using your most recent sample data:
separate(data_frame(a=c("45% (10/89)","34%","","67%","78% (89/98)")), a, c('x','y','z'), sep="[^[:alnum:]%]+", extra="drop")
# Warning: Too few values at 3 locations: 2, 3, 4
# # A tibble: 5 × 3
# x y z
# * <chr> <chr> <chr>
# 1 45% 10 89
# 2 34% <NA> <NA>
# 3 <NA> <NA>
# 4 67% <NA> <NA>
# 5 78% 89 98
Upvotes: 3