Sam Mwenda
Sam Mwenda

Reputation: 154

is there an R code to separate coordinates latitudes and longitudes with differing lengths?

I have included this minimal example.

cluster_id<-c(1,2)
lat_long<-c("35.92,0.34;35.98,-0.13;35.73,-1.29","38.98,-0.34;40.23,1.23")
d<-data.frame(cluster_id,lat_long)
d

I expect the following output

cluster_id<-c(1,1,1,2,2) 
latitude<-c(35.92,35.98,35.73,38.98,40.23) 
longitude<-c(0.34,-0.13,-1.29,-0.34,1.23) 
c<-data.frame(cluster_id,latitude,longitude)
c

@ Akindele Davies provided a great feedback using unsplit

However, am very interested in out put c above

Upvotes: 0

Views: 922

Answers (2)

Akindele Davies
Akindele Davies

Reputation: 399

I already answered your updated question in a comment to my original answer, but I can appreciate that it may have been hard to understand as a comment.

First, we'll combine the steps that I laid out earlier into a function parse_line().

parse_line <- function(line){
    coord_pairs <- strsplit(line, split = ";")
    # Separate the latitude-longitude components
    coords <- strsplit(unlist(coord_pairs), split = ",") # We have to unlist coord_pairs because strsplit() expects a character vector
    
    # coords is a list of two-element vectors (lat and long)
    # Combine the elements of coords into a matrix, then coerce to a dataframe
    
    df <- as.data.frame(do.call(rbind, coords)) 
}

Then we'll use parse_line() as a building block for a similar function parse_lines().

parse_lines <- function(cluster_ids, lines){
  parsed_dfs <- Map(function(x, y) cbind(x, parse_line(y)), cluster_ids, lines) 
# Iterates over all the pairs of cluster_ids and lines
# and adds the cluster_id as a column to the dataframe produced by calling 
# parse_line() on the corresponding line
  combined_df <- do.call(rbind, parsed_dfs) # Combines the list of dataframes into a single dataframe
  colnames(combined_df) <- c("Cluster_ID", "Latitude", "Longitude") # Adds appropriate column names
  return(combined_df)
}

parse_lines(cluster_ids, lat_long)

Upvotes: 0

Akindele Davies
Akindele Davies

Reputation: 399

If I understand your question correctly, you have a single string that is a collection of latitude-longitude pairs. From the sample you posted, each coordinate pair is separated by a semicolon (";") and within each pair, the latitude and longitude are separated by a comma (","). We can use this structure to solve the problem.

foo <- "35.9289842120708,-0.37401629584697;35.9295981311974,-0.370106682789026;35.9289842120708,-0.370106682789026"

# Split into a list coordinate pairs
coord_pairs <- strsplit(foo, split = ";")

# Separate the latitude-longitude components
coords <- strsplit(unlist(coord_pairs), split = ",") # We have to unlist coord_pairs because strsplit() expects a character vector

# coords is a list of two-element vectors (lat and long)
# Combine the elements of coords into a matrix, then coerce to a dataframe

df <- as.data.frame(do.call(rbind, coords)) 

Upvotes: 1

Related Questions