Reputation: 1875
I have a data.frame with a column "offence". Every offence consists of an article (Art), a paragraph (Abs) and a sub-paragraph (Ziff) :
df<-data.frame(offence=c("Art. 110 Abs. 3 StGB","Art. 10 Abs. 1 StGB", "Art. 122 SVG", "Art. 1 Ziff. 2 UWG"))
> df
offence
1 Art. 110 Abs. 3 StGB
2 Art. 10 Abs. 1 StGB
3 Art. 122 SVG
4 Art. 1 Ziff. 2 UWG
But I need to have it in that form:
Art Ziff Abs Law
1 110 NA 3 StGB
2 10 NA 1 StGB
3 122 NA NA SVG
4 1 2 NA UWG
What is the best way to get this result? lapply?
Thank you!
Upvotes: 1
Views: 224
Reputation: 269441
Convert it to dcf form (i.e. keyword: value) using gsub
and then read it in using read.dcf
. At the end convert the matrix that read.dcf
produces to a data frame and convert any number columns to numeric. No packages are used.
s <- gsub("(\\S+)[.] (\\d+)", "\\1: \\2\n", df[[1]]) # convert to keyword: value
s <- sub(" (\\D+)$", "Law: \\1\n\n", s) # handle Law column
us <- trimws(unlist(strsplit(s, "\n"))) # split into separate components
DF <- as.data.frame(read.dcf(textConnection(us)), stringsAsFactors = FALSE)
DF[] <- lapply(DF, type.convert)
giving:
Art Abs Law Ziff
1 110 3 StGB NA
2 10 1 StGB NA
3 122 NA SVG NA
4 1 NA UWG 2
Upvotes: 1
Reputation: 18661
You can use str_extract
from stringr
:
library(stringr)
library(dplyr)
df$offence %>%
{data.frame(Art = str_extract(., "(?<=Art[.]\\s)\\d+"),
Ziff = str_extract(., "(?<=Ziff[.]\\s)\\d+"),
Abs = str_extract(., "(?<=Abs[.]\\s)\\d+"),
Law = str_extract(., "\\w+$"))}
Result:
Art Ziff Abs Law
1 110 <NA> 3 StGB
2 10 <NA> 1 StGB
3 122 <NA> <NA> SVG
4 1 2 <NA> UWG
Upvotes: 1