Umar Arshad
Umar Arshad

Reputation: 123

Extracting numbers in different format from a text in google sheet

I have a column A containing text (Job post) in each cell. I want to extract job Salary (which is number like 2000, 20,000, 4.5k etc) into relevant cell D.

I have shared my google sheet which desired and current output column. here is google sheet link

Currently I am using a custom script which is extracting only numbers for example(3000, 2000, 9000 etc only) range of numbers which i want to extract is in another sheet names " Data Lookup" .

Custom script is below


    /** *
    * @customfunction
    */
    
    
    function salary(text, list){
    
      var result = 'Negotiable';
      list = list.flat();
      list.forEach( str => {
        if(text.includes(str)){
          result = str;
          return result;
        }
      });
      return result;
   }

I want to extract numbers which contains comma like (20,000, 30,000, 50,000) and numbers more than 9000 (10000, 11000 etc, currently extracting 1000 and 1100 only from such numbers) and other alphanumeric numbers like (4.5 k , 3.5k 9k etc)

How can i achieve this desired result. Thanks in advance

Upvotes: 0

Views: 444

Answers (2)

Erik Tyler
Erik Tyler

Reputation: 9345

I will suggest that you do not need a script to accomplish this.

I've added a new sheet ("Erik Help") to your sample spreadsheet. In that sheet, I have placed a single formula in B2:

=ArrayFormula(IF(A2:A="",,IFERROR(REGEXEXTRACT(REGEXEXTRACT(REGEXREPLACE(LOWER(TRIM(REGEXREPLACE(SUBSTITUTE(A2:A,CHAR(10),""),",",""))),"^[^~]*([^~]{8}aed.*$)","$1"),"\d{4,6}|\d*\.*\d\s*k"),"[\d\.]+")*IF(NOT(ISERROR(REGEXEXTRACT(LOWER(A2:A),"\d\s*k"))),1000,1),"unspecified")))

This one formula will produce results for all rows (based on the limited current information in the sheet). It assumes that you want salary listings rounded to one of the numbers in the 'Data Lookup'!A2:A list.

I added into the formula a few extra parts that will assure you don't wind up extracting a number that you didn't mean to extract.

If this formula suits you, you can get rid of your script and simply use the formula. I always recommend using formulas where possible and reserving script use only for those times when it formulas cannot accomplish the same thing. Following this guideline reduces potential problems all around.

Upvotes: 2

Yuri Khristich
Yuri Khristich

Reputation: 14502

For this particular dataset this function should work:

function get_salary(text){
  
  var num = text.split(' ').reverse()
  .find(x => /\d/.test(x))
  .replace(/,/g,'');

  if (/k/i.test(num)) num = parseFloat(num) * 1000;
  
  return num;
}

enter image description here

It takes a last 'word' that contains a number inside, removes all commas, parses it, mutiplies by 1000 if there is a 'k' and returns the result.

But parsing such texts is a tricky task. Basically there is no general simply solution.

Update

Here is the updated version of the function:

function get_salary(text){
  
  var num = text.split(/\s/).reverse()
  .filter(x => x.replace(/[^0-9^k]/gi,'').length > 2)
  .find(x => /\d/.test(x))
  .replace(/,/g,'');

  if (/k/i.test(num)) num = parseFloat(num) * 1000;
  
  return num;
}

It splits the text by \s instead of spaces and excludes the 'numbers' with length less than three digits (if there is no k).

enter image description here

But keep in mind -- there will never be a final solution.

Upvotes: 2

Related Questions