user3408139
user3408139

Reputation: 197

string arrange based on the character length in R

I have a data frame below as DT.

YEAR  STATE LEGAL         char
1998  24    L00161N11W2     11
1998  24    L00161N11W11    12
1998  24    L00163N9W6      10
1998  24    L00363N9W6      10

Can somebody suggest me an efficient way to split the LEGAL column and arrange as shown in the "output"? I have calculated the "char" column which shows the length of characters in LEGAL column string.

I need to establish a method to arrange the SEC, TOWN and RANGE columns based on the length of the characters.

Output:
    YEAR  STATE LEGAL         char   SEC  TOWN  RANGE
    1998  24    L00161N11W2     11   2    61N   11W
    1998  24    L00161N11W11    12   11   61N   11W
    1998  24    L00163N9W6      10   6    63N   9W

Any help is appreciated.

Thanks in advance.

Upvotes: 1

Views: 234

Answers (3)

thelatemail
thelatemail

Reputation: 93813

Using strcapture in the most recent releases of R, which practically mirrors the tidyr::extract solution:

cbind(df, 
strcapture(
  "L\\d{3}(\\d+?\\D+?)(\\d+?\\D+?)(\\d+)", 
  as.character(df$LEGAL),
  proto=data.frame(TOWN=character(), RANGE=character(), SEC=character())
))

#  YEAR STATE        LEGAL char TOWN RANGE SEC
#1 1998    24  L00161N11W2   11  61N   11W   2
#2 1998    24 L00161N11W11   12  61N   11W  11
#3 1998    24   L00163N9W6   10  63N    9W   6

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520968

One base R solution to this would be to just use gsub. In the code snippet below, I tried to use regex patterns which were maximally flexible. For example, for extracting the SEC column, I used logic which says to match everything up until the last non digit character. Then, capture and extract the digits which follow. This solution should be relatively robust to your legal string.

df <- data.frame(YEAR=c(1998, 1998, 1998),
                 STATE=c(24, 24, 24),
                 LEGAL=c("L00161N11W2", "L00161N11W11", "L00163N9W6"),
                 char=c(11, 12, 10))

df$SEC   <- gsub(".*[^0-9]([0-9]+)", "\\1", df$LEGAL)
df$TOWN  <- gsub("L[0-9]{3}([0-9]+[^0-9]+).*", "\\1", df$LEGAL)
df$RANGE <- gsub(".*?([0-9]+[^0-9]+)[0-9]+$", "\\1", df$LEGAL)
df

Output:

  YEAR STATE        LEGAL char SEC TOWN RANGE
1 1998    24  L00161N11W2   11   2  61N   11W
2 1998    24 L00161N11W11   12  11  61N   11W
3 1998    24   L00163N9W6   10   6  63N    9W

Demo here:

Rextester

Upvotes: 2

markdly
markdly

Reputation: 4534

You could use extract from the tidyr package:

library(tidyverse)

df <- read_table("YEAR  STATE LEGAL         char
1998  24    L00161N11W2     11
1998  24    L00161N11W11    12
1998  24    L00163N9W6      10") 

df %>% extract(LEGAL, c("TOWN", "RANGE", "SEC"), "L001(\\d+\\D)(\\d+\\D)(\\d+)", remove = FALSE)
#> # A tibble: 3 x 7
#>    YEAR STATE        LEGAL  TOWN RANGE   SEC  char
#> * <int> <int>        <chr> <chr> <chr> <chr> <int>
#> 1  1998    24  L00161N11W2   61N   11W     2    11
#> 2  1998    24 L00161N11W11   61N   11W    11    12
#> 3  1998    24   L00163N9W6   63N    9W     6    10

Upvotes: 1

Related Questions