random_user_567
random_user_567

Reputation: 49

Create a list based on pattern in string vector

I'm trying to generate a list with information by location. Currently, I have a character vector with strings. Which has location, information, information, information, location2, information, information, information structure. I want a list where each element is location1: information, information information etc.

I have tried to create a loop that identifies the locations in the data, however i fail to understand how to dynamically join the information together with the location (the locations and # of information is changing so I need the solution to be dynamic).

list_of_locations = list()
locations = c("location1","location2")
original_vector = c("location1","July 123","August 345", "September 678", "location2","July 123","August 345")

for (word in original_vector){
  if(word %in% locations){
    list_of_locations[[word]] = word 
  } else {
    list_of_locations[[word]] = word
  }
}

I'm looking for a list:

1: location1, July 123, August 345, September 678
2: location2, July 123, August 345...

Upvotes: 0

Views: 296

Answers (1)

Roland
Roland

Reputation: 132676

Not a useful data format, but here you are:

split(original_vector, 
  cumsum(
    grepl("location", original_vector, fixed = TRUE) #search for the word "location"
  )
)
#$`1`
#[1] "location1"     "July 123"      "August 345"    "September 678"
#
#$`2`
#[1] "location2"  "July 123"   "August 345"

Or (thanks to @Ronak) if you have the locations vector:

split(original_vector, cumsum(original_vector %in% locations)

If your data were actually in the described format (1 location, 3 information entries), I would turn original_vector into a matrix:

original_vector = c("location1","July 123","August 345", "September 678", "location2","July 123","August 345", "September 678")
t(matrix(original_vector, 4))
#     [,1]        [,2]       [,3]         [,4]           
#[1,] "location1" "July 123" "August 345" "September 678"
#[2,] "location2" "July 123" "August 345" "September 678"

This format allows easy subsetting and other data processing.

Upvotes: 1

Related Questions