Reputation: 15
I have a datatable in R with a column that contains multiple values separated by a comma and I would like to extract the last value from each row in that column. However, the number of values in the different rows vary. Some times there are 3 or 4 values separated by commas and sometimes there is only one. I always need the last value.
The rows look basically like this.
[1.848]
[0.207,0.403]
[0.3,0.29]
[0.948,3.064]
[0.431,0.431,0.214]
[0.305,0.305,0.30]
[0.213,0.42]
[0.31,0.30]
[1.484,2.433]
That would be my preferred outcome:
1.848
0.403
0.29
3.064
0.214
0.30
0.42
0.30
2.433
Upvotes: 1
Views: 207
Reputation: 388982
A base R approach using sub
to extract the last number within []
.
df$col2 <- as.numeric(sub('\\[.*(\\d+\\.\\d+)\\]', '\\1', df$col1))
df
# col1 col2
#1 [1.848] 1.848
#2 [0.207,0.403] 0.403
#3 [0.3,0.29] 0.290
#4 [0.948,3.064] 3.064
#5 [0.431,0.431,0.214] 0.214
#6 [0.305,0.305,0.30] 0.300
#7 [0.213,0.42] 0.420
#8 [0.31,0.30] 0.300
#9 [1.484,2.433] 2.433
If there is no []
in your actual data, this would work.
df$col2 <- as.numeric(sub('.*(\\d+\\.\\d+)', '\\1', df$col1))
Upvotes: 0
Reputation: 78927
We could use separate_rows
:
library(dplyr)
library(tidyr)
df1 %>%
mutate(id = row_number()) %>%
separate_rows(col1) %>%
type.convert(as.is = TRUE) %>%
na.omit () %>%
group_by(id) %>%
filter(col1 == last(col1)) %>%
ungroup() %>%
select(-id)
output:
col1
<dbl>
1 1.85
2 0.403
3 0.29
4 3.06
5 0.214
6 0.3
7 0.42
8 0.3
9 2.43
Upvotes: 1
Reputation: 887128
We may use trimws
from base R
df1$col1 <- as.numeric(trimws(df1$col1, whitespace = ".*,|[][]"))
-output
> df1
col1
1 1.848
2 0.403
3 0.290
4 3.064
5 0.214
6 0.300
7 0.420
8 0.300
9 2.433
df1 <- structure(list(col1 = c("[1.848]", "[0.207,0.403]", "[0.3,0.29]",
"[0.948,3.064]", "[0.431,0.431,0.214]", "[0.305,0.305,0.30]",
"[0.213,0.42]", "[0.31,0.30]", "[1.484,2.433]")),
class = "data.frame", row.names = c(NA,
-9L))
Upvotes: 1
Reputation: 8811
Here a solution that I use ,
as reference to get the last number.
library(dplyr)
library(stringr)
x <- c("1.848", "0.207,0.403", "0.3,0.29", "0.948,3.064", "0.431,0.431,0.214",
"0.305,0.305,0.30", "0.213,0.42", "0.31,0.30", "1.484,2.433")
df <- tibble(x = x)
last_number <- function(x){
as.numeric(word(x,str_count(x,",") + 1,sep = ","))
}
df %>%
mutate(last_number = last_number(x))
# A tibble: 9 x 2
x last_number
<chr> <dbl>
1 1.848 1.85
2 0.207,0.403 0.403
3 0.3,0.29 0.29
4 0.948,3.064 3.06
5 0.431,0.431,0.214 0.214
6 0.305,0.305,0.30 0.3
7 0.213,0.42 0.42
8 0.31,0.30 0.3
9 1.484,2.433 2.43
Upvotes: 0