Reputation: 751

MySQL and Python: Convert string value to int, to float or what?

I have a string that contains salary information in the following way:

salaryMixed = "£25,000 - £30,000"

Sometimes it will look like this:

salaryMixed = "EUR25,000 - EUR30,000"

And others times like this:

salaryMixed = "£37.50 - £50.00"

What I want to do is to remove all characters but the numeric values and then split the two values so as to place them into their own respective variables that reflect low banding and high banding. So far I have:

if salaryMixed.find('£')!=-1: # found £ char
    salaryMixed = salaryMixed.replace("£", "")
if salaryMixed.find('-')!=-1: # found hyphen
    salaryMixed = salaryMixed.replace("-", "")
if salaryMixed.find(',')!=-1: # found comma
    salaryMixed = salaryMixed.replace(",", "")
if salaryMixed.find('EUR')!=-1: # found EUR
    salaryMixed = salaryMixed.replace("EUR", "")
salaryMixed = re.sub('\s{2,}', ' ', salaryMixed) # to remove multiple space

if len(salaryList) == 1:
    salaryLow = map(int, 0) in salaryList
    salaryHigh = 00000
else:
    salaryLow = int(salaryList.index(1))
    salaryHigh = int(salaryList.index(2))

But I am stumped with how to split the two values up, and also how to handle the decimal point when salaryMixed isn't an annual salary but rather per hour as in the case of salaryMixed = "£37.50 - £50.00" because isn't that a float?

I am wanting to store this information in a MySQL DB later on in the code but I have described the table as:

CREATE TABLE jobs(
   job_id INT NOT NULL AUTO_INCREMENT,
   job_title VARCHAR(300) NOT NULL,
   job_salary_low INT(25),
   job_salary_high INT(25),
   PRIMARY KEY ( job_id )
);

What is the best approach here? Thanks.

Upvotes: 1

Answers (3)

theodox

Reputation: 12218

This is a good case for a regular expression from the python re module. And you'll probably want to upcast the hourly rates to annual (assuming you have a consistent average hourly

import re

def salary_band(val):
    currency = 'EUR' if 'EUR' in val else 'GBP'
    numbers = re.findall("[0-9.\,]*", val) # this will have a bunch of empty entries and two numbers
    numbers = [i.replace(",","") for i in numbers if i] # filter out empty strings, remove commas
    numbers = map(float, numbers) # convert to floats
    annual = lambda p: int(p) if p > 2000 else int( p * 1800) # your number here...
    return currency, map(annual, numbers)

print salary_band ( "gbp37.50 - gbp50.00")
print salary_band ( "EUR25,000 - EUR30,000")
>> ('GBP', [75000, 100000])
>> ('EUR', [25000, 30000])

Here i'm returning the currency type and the high/low numbers as a tuple - you can unpack it easily into your table

Upvotes: 1

Kapila Clan

Reputation: 85

for storing the values in db, you can use MySQLdb library in python.It's easy to use and will store al your data to database. Here check it out.

You can install it by apt-get install python-mysqldb

Upvotes: 0

John La Rooy

Reputation: 304503

What I want to do is to remove all characters but the numeric values and then split the two values so as to place them into their own respective variables that reflect low banding and high banding. So far I have:

Ok taking this one step at a time. Remove all the characters but the numeric values (Better keep spaces and periods too)

>>> testcases =  ["£25,000 - £30,000", "EUR25,000 - EUR30,000", "£37.50 - £50.00"]
>>> res = [''.join(x for x in tc if x.isdigit() or x.isspace() or x == '.') for tc in testcases]
>>> res
['25000  30000', '25000  30000', '37.50  50.00']

ok, now split them

>>> res = [x.split() for x in res]
>>> res
[['25000', '30000'], ['25000', '30000'], ['37.50', '50.00']]

Convert to floats (Decimal might be better)

>>> res = [[float(j) for j in i] for i in res]>>> res
[[25000.0, 30000.0], [25000.0, 30000.0], [37.5, 50.0]]

Put in separate variables

>>> for low, high in res:
...     print (low, high)
... 
25000.0 30000.0
25000.0 30000.0
37.5 50.0

regex as suggested by @Patashu is the easy/lazy way to do it though

Upvotes: 1

MySQL and Python: Convert string value to int, to float or what?

Answers (3)

Related Questions