Sohel Shaikh
Sohel Shaikh

Reputation: 47

How to 'Scale Data' in Pandas or any other Python Libraries

I'm analyzing Company Data set that stores 'Company Name', 'Company Profit'. I also have another data set that has '# of Employees', 'Feedback (Negative or Positive)'. I want to analyze do Companies with more Profit Worth have more Positive Employees or not. So the question is I will have 'Company Profit' in millions or billions and number of employees would be quite small.

So, Can I scale the data or do something else here?

Suggestions are welcome.

Upvotes: 0

Views: 56

Answers (1)

If you have a table that looks like this:

  Company Name  Company Profit  # of Employees Feedback (Negative or Positive)
0        Alpha         1000000              10                        Positive
1        Bravo        13000000             210                        Positive
2      Charlie         2300000              16                        Negative
3        Delta          130000               1                        Negative

and want a table that looks like this:

 Company Name  Company Profit (Million)  # of Employees     Feedback (Negative or Positive)  
0        Alpha                      1.00              10                 Positive  
1        Bravo                     13.00             210                 Positive  
2      Charlie                      2.30              16                 Negative  
3        Delta                      0.13               1                 Negative  

Then you can use the apply method and a lambda function to rescale the data.

#this part creates the original table 
import pandas as pd
columns = ['Company Name', 'Company Profit', '# of Employees', 'Feedback (Negative or Positive)']
df = pd.DataFrame([('Alpha', 1000000, 10, 'Positive'), 
                   ('Bravo', 13000000, 210, 'Positive'),
                   ('Charlie', 2300000, 16, 'Negative'),
                   ('Delta', 130000, 1, 'Negative')], columns = columns)
#this part makes the modification
df['Company Profit (Million)'] = df['Company Profit'].apply(lambda x: x/1000000)
df = df [['Company Name', 'Company Profit (Million)', '# of Employees', 'Feedback (Negative or Positive)']]

Upvotes: 1

Related Questions