Reputation: 413
If I have a string,
x <- "Hello World"
How can I access the second word, "World", using string split, after
x <- strsplit(x, " ")
x[[2]] does not do anything.
Upvotes: 22
Views: 37539
Reputation: 7400
As vapply()
isn't mentioned, I would like to add it:
c("Hello world", "Hi there", "Back at ya") |>
strsplit(split = " ") |>
vapply(FUN = "[", FUN.VALUE = character(1L), 2L)
#> [1] "world" "there" "at"
Created on 2023-11-30 with reprex v2.0.2
Assumption: strsplit(x, " ")
is mandatory.
vapply()
is generally preferred over sapply()
(reference).
Upvotes: 0
Reputation: 101317
Probably you can play with regex
using sub
> x <- c("Hello world", "Hi there", "Back at ya")
> sub(".*?\\W+(\\w+).*","\\1",x)
[1] "world" "there" "at"
Upvotes: 0
Reputation: 51974
With stringr 1.5.0
, you can use str_split_i
to access the ith element of a split string:
library(stringr)
x <- "Hello World"
str_split_i(x, " ", i = 2)
#[1] "World"
It is vectorized:
x <- c("Hello world", "Hi there", "Back at ya")
str_split_i(x, " ", 2)
#[1] "world" "there" "at"
Upvotes: 5
Reputation: 460
Another approach that might be a little easier to read and apply to a data frame within a pipeline (though it takes more lines) would be to wrap it in your own function and apply that.
library(tidyverse)
df <- data.frame(
greetings = c( "Hello world", "Hi there", "Back at ya" )
)
split_params = function (x, sep, n) {
# Splits string into list of substrings separated by 'sep'.
# Returns nth substring.
x = strsplit(x, sep)[[1]][n]
return(x)
}
df = df %>%
mutate(
'greetings' = sapply(
X = greetings,
FUN = split_params,
# Arguments for split_params.
sep = ' ',
n = 2
)
)
df
### (Output in RStudio Notebook)
greetings second_word
<chr> <chr>
Hello world world
Hi there there
Back at ya at
3 rows
###
Upvotes: 0
Reputation: 1
x=strsplit("a;b;c;d",";")
x
[[1]] [1] "a" "b" "c" "d"
x=as.character(x[[1]])
x
[1] "a" "b" "c" "d"
x=strsplit(x," ")
x
[[1]] [1] "a"
[[2]] [1] "b"
[[3]] [1] "c"
[[4]] [1] "d"
Upvotes: -3
Reputation: 5580
As mentioned in the comments, it's important to realise that strsplit
returns a list object. Since your example is only splitting a single item (a vector of length 1) your list is length 1. I'll explain with a slightly different example, inputting a vector of length 3 (3 text items to split):
input <- c( "Hello world", "Hi there", "Back at ya" )
x <- strsplit( input, " " )
> x
[[1]]
[1] "Hello" "world"
[[2]]
[1] "Hi" "there"
[[3]]
[1] "Back" "at" "ya"
Notice that the returned list has 3 elements, one for each element of the input vector. Each of those list elements is split as per the strsplit
call. So we can recall any of these list elements using [[
(this is what your x[[2]]
call was doing, but you only had one list element, which is why you couldn't get anything in return):
> x[[1]]
[1] "Hello" "world"
> x[[3]]
[1] "Back" "at" "ya"
Now we can get the second part of any of those list elements by appending a [
call:
> x[[1]][2]
[1] "world"
> x[[3]][2]
[1] "at"
This will return the second item from each list element (note that the "Back at ya" input has returned "at" in this case). You can do this for all items at once using something from the apply
family. sapply
will return a vector, which will probably be good in this case:
> sapply( x, "[", 2 )
[1] "world" "there" "at"
The last value in the input here (2) is passed to the [
operator, meaning the operation x[2]
is applied to every list element.
If instead of the second item, you'd like the last item of each list element, we can use tail
within the sapply
call instead of [
:
> sapply( x, tail, 1 )
[1] "world" "there" "ya"
This time, we've applied tail( x, 1 )
to every list element, giving us the last item.
As a preference, my favourite way to apply actions like these is with the magrittr
pipe, for the second word like so:
x <- input %>%
strsplit( " " ) %>%
sapply( "[", 2 )
> x
[1] "world" "there" "at"
Or for the last word:
x <- input %>%
strsplit( " " ) %>%
sapply( tail, 1 )
> x
[1] "world" "there" "ya"
Upvotes: 38