Reputation: 751
I have a string that contains salary information in the following way:
salaryMixed = "£25,000 - £30,000"
Sometimes it will look like this:
salaryMixed = "EUR25,000 - EUR30,000"
And others times like this:
salaryMixed = "£37.50 - £50.00"
What I want to do is to remove all characters but the numeric values and then split the two values so as to place them into their own respective variables that reflect low banding and high banding. So far I have:
if salaryMixed.find('£')!=-1: # found £ char
salaryMixed = salaryMixed.replace("£", "")
if salaryMixed.find('-')!=-1: # found hyphen
salaryMixed = salaryMixed.replace("-", "")
if salaryMixed.find(',')!=-1: # found comma
salaryMixed = salaryMixed.replace(",", "")
if salaryMixed.find('EUR')!=-1: # found EUR
salaryMixed = salaryMixed.replace("EUR", "")
salaryMixed = re.sub('\s{2,}', ' ', salaryMixed) # to remove multiple space
if len(salaryList) == 1:
salaryLow = map(int, 0) in salaryList
salaryHigh = 00000
else:
salaryLow = int(salaryList.index(1))
salaryHigh = int(salaryList.index(2))
But I am stumped with how to split the two values up, and also how to handle the decimal point when salaryMixed
isn't an annual salary but rather per hour as in the case of salaryMixed = "£37.50 - £50.00"
because isn't that a float?
I am wanting to store this information in a MySQL DB later on in the code but I have described the table as:
CREATE TABLE jobs(
job_id INT NOT NULL AUTO_INCREMENT,
job_title VARCHAR(300) NOT NULL,
job_salary_low INT(25),
job_salary_high INT(25),
PRIMARY KEY ( job_id )
);
What is the best approach here? Thanks.
Upvotes: 1
Views: 1932
Reputation: 12208
This is a good case for a regular expression from the python re module. And you'll probably want to upcast the hourly rates to annual (assuming you have a consistent average hourly
import re
def salary_band(val):
currency = 'EUR' if 'EUR' in val else 'GBP'
numbers = re.findall("[0-9.\,]*", val) # this will have a bunch of empty entries and two numbers
numbers = [i.replace(",","") for i in numbers if i] # filter out empty strings, remove commas
numbers = map(float, numbers) # convert to floats
annual = lambda p: int(p) if p > 2000 else int( p * 1800) # your number here...
return currency, map(annual, numbers)
print salary_band ( "gbp37.50 - gbp50.00")
print salary_band ( "EUR25,000 - EUR30,000")
>> ('GBP', [75000, 100000])
>> ('EUR', [25000, 30000])
Here i'm returning the currency type and the high/low numbers as a tuple - you can unpack it easily into your table
Upvotes: 1
Reputation: 85
for storing the values in db, you can use MySQLdb library in python.It's easy to use and will store al your data to database. Here check it out.
You can install it by apt-get install python-mysqldb
Upvotes: 0
Reputation: 304137
What I want to do is to remove all characters but the numeric values and then split the two values so as to place them into their own respective variables that reflect low banding and high banding. So far I have:
Ok taking this one step at a time. Remove all the characters but the numeric values (Better keep spaces and periods too)
>>> testcases = ["£25,000 - £30,000", "EUR25,000 - EUR30,000", "£37.50 - £50.00"]
>>> res = [''.join(x for x in tc if x.isdigit() or x.isspace() or x == '.') for tc in testcases]
>>> res
['25000 30000', '25000 30000', '37.50 50.00']
ok, now split them
>>> res = [x.split() for x in res]
>>> res
[['25000', '30000'], ['25000', '30000'], ['37.50', '50.00']]
Convert to floats (Decimal might be better)
>>> res = [[float(j) for j in i] for i in res]>>> res
[[25000.0, 30000.0], [25000.0, 30000.0], [37.5, 50.0]]
Put in separate variables
>>> for low, high in res:
... print (low, high)
...
25000.0 30000.0
25000.0 30000.0
37.5 50.0
regex as suggested by @Patashu is the easy/lazy way to do it though
Upvotes: 1