Reputation: 583
I've got data.frame like the one below
ID age location
1 83 country=X;province=A;city=J
2 15 country=X;city=K
3 2 country=Y;province=B;city=I
4 12 country=X;city=L
5 2 country=Y;city=J
6 2 country=Y;province=A;city=M
7 18 country=X;province=B;city=J
8 85 country=X;province=A;city=I
To describe it: there is third column (location) in which records contain strings separated by ";" and have different lengths.
As a result, I need to obtain the data.frame as such:
ID age country city
1 83 X J
2 15 X K
3 2 Y I
4 12 X L
5 2 Y J
6 2 Y M
7 18 X J
8 85 X I
To describe - I need to separate the column and choose only records about country and city (without province). separate from dplyr allow only to divide by ";" but there is different number of ";" in rows. What should I do?
Upvotes: 3
Views: 913
Reputation: 39858
A different tidyverse
possibility could be:
df %>%
separate(location, c("country", "city"), sep = ";") %>%
mutate_at(3:4, funs(sub(".*=", "", .)))
ID age country city
1 1 83 X A
2 2 15 X K
3 3 2 Y B
4 4 12 X L
5 5 2 Y J
6 6 2 Y A
7 7 18 X B
8 8 85 X A
In the first step it separates "location" into "country" and "city" based on ;
. Then, it extract the elements after =
from the newly created columns.
Upvotes: 1
Reputation: 4520
You can do it with tidyr::extract
function:
library(tidyverse)
extract(
data = dat,
col = location,
into = c('country', 'city'),
regex = "^country=([[:alpha:]]+).*city=([[:alpha:]]+)$"
)
ID age country city
1 1 83 X J
2 2 15 X K
3 3 2 Y I
4 4 12 X L
5 5 2 Y J
6 6 2 Y M
7 7 18 X J
8 8 85 X I
Data
dat <- read.table(
text = "ID age location
1 83 country=X;province=A;city=J
2 15 country=X;city=K
3 2 country=Y;province=B;city=I
4 12 country=X;city=L
5 2 country=Y;city=J
6 2 country=Y;province=A;city=M
7 18 country=X;province=B;city=J
8 85 country=X;province=A;city=I",
header = T, stringsAsFactors = F
)
Upvotes: 2
Reputation: 764
df <- read.table(text='ID age location
1 83 country=X;province=A;city=J
2 15 country=X;city=K
3 2 country=Y;province=B;city=I
4 12 country=X;city=L
5 2 country=Y;city=J
6 2 country=Y;province=A;city=M
7 18 country=X;province=B;city=J
8 85 country=X;province=A;city=I
', header= T)
my_fun <- function(x){
y <- as.data.frame(strsplit(x,'='))
names(y) <- as.character(unlist(y[1,]))
y <- y[-1,]
}
f <- strsplit(as.character(df$location), ";")
s <- data.table::rbindlist(lapply(f,my_fun), fill=T)
df$location <- NULL
df <- cbind(df, s)
df
ID age country province city
1 1 83 X A J
2 2 15 X <NA> K
3 3 2 Y B I
4 4 12 X <NA> L
5 5 2 Y <NA> J
6 6 2 Y A M
7 7 18 X B J
8 8 85 X A I
Upvotes: 1
Reputation: 15062
You can do this with separate_rows
and some subsequent reshaping. separate_rows
puts things into long format, so we can do some filtering and then spread
to get the desired result:
library(tidyverse)
tbl <- read_table2(
"ID age location
1 83 country=X;province=A;city=J
2 15 country=X;city=K
3 2 country=Y;province=B;city=I
4 12 country=X;city=L
5 2 country=Y;city=J
6 2 country=Y;province=A;city=M
7 18 country=X;province=B;city=J
8 85 country=X;province=A;city=I"
)
tbl %>%
separate_rows(location, sep = ";") %>%
separate(location, c("location_type", "value")) %>%
filter(location_type %in% c("country", "city")) %>%
spread(location_type, value)
#> # A tibble: 8 x 4
#> ID age city country
#> <dbl> <dbl> <chr> <chr>
#> 1 1 83 J X
#> 2 2 15 K X
#> 3 3 2 I Y
#> 4 4 12 L X
#> 5 5 2 J Y
#> 6 6 2 M Y
#> 7 7 18 J X
#> 8 8 85 I X
Because you only have two cases to deal with, you might find it faster or easier to just use regex to directly extract the values of interest:
tbl %>%
mutate(
country = str_extract(location, "(?<=country\\=)."),
city = str_extract(location, "(?<=city\\=).")
)
#> # A tibble: 8 x 5
#> ID age location country city
#> <dbl> <dbl> <chr> <chr> <chr>
#> 1 1 83 country=X;province=A;city=J X J
#> 2 2 15 country=X;city=K X K
#> 3 3 2 country=Y;province=B;city=I Y I
#> 4 4 12 country=X;city=L X L
#> 5 5 2 country=Y;city=J Y J
#> 6 6 2 country=Y;province=A;city=M Y M
#> 7 7 18 country=X;province=B;city=J X J
#> 8 8 85 country=X;province=A;city=I X I
Created on 2019-02-25 by the reprex package (v0.2.1)
Upvotes: 2