Reputation: 197
I have a data frame below as DT.
YEAR STATE LEGAL char
1998 24 L00161N11W2 11
1998 24 L00161N11W11 12
1998 24 L00163N9W6 10
1998 24 L00363N9W6 10
Can somebody suggest me an efficient way to split the LEGAL column and arrange as shown in the "output"? I have calculated the "char" column which shows the length of characters in LEGAL column string.
I need to establish a method to arrange the SEC, TOWN and RANGE columns based on the length of the characters.
Output:
YEAR STATE LEGAL char SEC TOWN RANGE
1998 24 L00161N11W2 11 2 61N 11W
1998 24 L00161N11W11 12 11 61N 11W
1998 24 L00163N9W6 10 6 63N 9W
Any help is appreciated.
Thanks in advance.
Upvotes: 1
Views: 234
Reputation: 93813
Using strcapture
in the most recent releases of R, which practically mirrors the tidyr::extract
solution:
cbind(df,
strcapture(
"L\\d{3}(\\d+?\\D+?)(\\d+?\\D+?)(\\d+)",
as.character(df$LEGAL),
proto=data.frame(TOWN=character(), RANGE=character(), SEC=character())
))
# YEAR STATE LEGAL char TOWN RANGE SEC
#1 1998 24 L00161N11W2 11 61N 11W 2
#2 1998 24 L00161N11W11 12 61N 11W 11
#3 1998 24 L00163N9W6 10 63N 9W 6
Upvotes: 1
Reputation: 520968
One base R solution to this would be to just use gsub
. In the code snippet below, I tried to use regex patterns which were maximally flexible. For example, for extracting the SEC
column, I used logic which says to match everything up until the last non digit character. Then, capture and extract the digits which follow. This solution should be relatively robust to your legal string.
df <- data.frame(YEAR=c(1998, 1998, 1998),
STATE=c(24, 24, 24),
LEGAL=c("L00161N11W2", "L00161N11W11", "L00163N9W6"),
char=c(11, 12, 10))
df$SEC <- gsub(".*[^0-9]([0-9]+)", "\\1", df$LEGAL)
df$TOWN <- gsub("L[0-9]{3}([0-9]+[^0-9]+).*", "\\1", df$LEGAL)
df$RANGE <- gsub(".*?([0-9]+[^0-9]+)[0-9]+$", "\\1", df$LEGAL)
df
Output:
YEAR STATE LEGAL char SEC TOWN RANGE
1 1998 24 L00161N11W2 11 2 61N 11W
2 1998 24 L00161N11W11 12 11 61N 11W
3 1998 24 L00163N9W6 10 6 63N 9W
Demo here:
Upvotes: 2
Reputation: 4534
You could use extract
from the tidyr
package:
library(tidyverse)
df <- read_table("YEAR STATE LEGAL char
1998 24 L00161N11W2 11
1998 24 L00161N11W11 12
1998 24 L00163N9W6 10")
df %>% extract(LEGAL, c("TOWN", "RANGE", "SEC"), "L001(\\d+\\D)(\\d+\\D)(\\d+)", remove = FALSE)
#> # A tibble: 3 x 7
#> YEAR STATE LEGAL TOWN RANGE SEC char
#> * <int> <int> <chr> <chr> <chr> <chr> <int>
#> 1 1998 24 L00161N11W2 61N 11W 2 11
#> 2 1998 24 L00161N11W11 61N 11W 11 12
#> 3 1998 24 L00163N9W6 63N 9W 6 10
Upvotes: 1