how to extract some data from cells?

Question

I would like to extract number information in cells. My data looks like this.

      item       stock
PRE 24DX4SX15G    200      
PLS 12RX10SX15G   200
ADU 24PX200ML     200
NIS 18PX40SX11G   200
REF 500GX12D      200

i want to extract the numbers which located besides alphabet D,R and P. I want to use this number to do multiplication with the stock. For example like this.

          item       stock    pcs    total
    PRE 24DX4SX15G    200      24    4800
    PLS 12RX10SX15G   200      12    2400 
    ADU 24PX200ML     200      24    4800
    NIS 18PX40SX11G   200      18    3600
    REF 500GX12D      200      12    2400

anyone know how to extract the numbers? thanks very much in advance

January · Accepted Answer

I would use a suitable regular expression:

df$pcs <- gsub("^(.*[^0-9]|)([0-9]+)[DPR].*", "\2", df$item)
df$pcs <- as.numeric(df$pcs)

Explanation:

[DPR] means we are matching one character which is either D, P or R
[^0-9] means we are matching anything but one of the digits 0-9
The parentheses create two regular expression groups, first being a prefix and the second being the actual number we want to match.
we can use the second group in substitute by preceding 2 with a backslash; only to get a backslash in R, we need to enter two backslashes: \2
The number of interest (preceding a D, P or R) can be found at different locations. Either it is at the beginning, in which case we should match ^([0-9]+)[DPR], or it is somewhere inside the text. Therefore we have to first match the beginning of string (^) and then alternatively match either nothing or anything that does not end in a number (.*[^0-9]).

EDIT: with the stringr library (as @yamabrina showed in the other answer) the regular expression is much simpler:

library(stringr)
df$pcs <- as.numeric(str_sub(str_extract(df$item, "[0-9]+[DPR]"), 1, -2))

how to extract some data from cells?

Answers (2)

Related Questions