Reputation: 151
I would like to extract number information in cells. My data looks like this.
item stock
PRE 24DX4SX15G 200
PLS 12RX10SX15G 200
ADU 24PX200ML 200
NIS 18PX40SX11G 200
REF 500GX12D 200
i want to extract the numbers which located besides alphabet D,R and P. I want to use this number to do multiplication with the stock. For example like this.
item stock pcs total
PRE 24DX4SX15G 200 24 4800
PLS 12RX10SX15G 200 12 2400
ADU 24PX200ML 200 24 4800
NIS 18PX40SX11G 200 18 3600
REF 500GX12D 200 12 2400
anyone know how to extract the numbers? thanks very much in advance
Upvotes: 0
Views: 65
Reputation: 1666
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringr)
dataset <- read.table(text = ' item stock
PRE 24DX4SX15G 200
PLS 12RX10SX15G 200
ADU 24PX200ML 200
NIS 18PX40SX11G 200
REF 500GX12D 200',
header = TRUE)
dataset %>%
mutate(pcs = as.numeric(x = str_sub(string = str_extract(string = item,
pattern = "\\d+[DRP]"),
start = 1,
end = -2)),
total = (stock * pcs))
#> item stock pcs total
#> 1 24DX4SX15G 200 24 4800
#> 2 12RX10SX15G 200 12 2400
#> 3 24PX200ML 200 24 4800
#> 4 18PX40SX11G 200 18 3600
#> 5 500GX12D 200 12 2400
Created on 2019-07-12 by the reprex package (v0.3.0)
Hope this helps.
Upvotes: 2
Reputation: 17110
I would use a suitable regular expression:
df$pcs <- gsub("^(.*[^0-9]|)([0-9]+)[DPR].*", "\\2", df$item)
df$pcs <- as.numeric(df$pcs)
Explanation:
[DPR]
means we are matching one character which is either D, P or R[^0-9]
means we are matching anything but one of the digits 0-9\\2
^([0-9]+)[DPR]
, or it is somewhere inside the text. Therefore we have to first match the beginning of string (^
) and then alternatively match either nothing or anything that does not end in a number (.*[^0-9]
).EDIT: with the stringr library (as @yamabrina showed in the other answer) the regular expression is much simpler:
library(stringr)
df$pcs <- as.numeric(str_sub(str_extract(df$item, "[0-9]+[DPR]"), 1, -2))
Upvotes: 1