user13904208
user13904208

Reputation: 21

Python - Manipulating numbers for a round number and create new column?

I have the following question I would like to ask:

EXAMPLE:

INITIAL DATA

DATE        EMAIL                SALE
10/2/2020   [email protected]        $  501.00 
10/2/2020   [email protected]        $  100.00 
10/2/2020   [email protected]       $  50.00 
10/2/2020   [email protected]    $  32.00 
10/2/2020   [email protected]        $  501.00 
10/3/2020   [email protected]       $  45.00 
10/3/2020   [email protected]        $  75.00 
10/3/2020   [email protected]    $  100.00 
10/4/2020   [email protected]    $  200.00  

DESIRED OUTPUT

DATE       EMAIL             SALE     CHECKED LEFT?   
10/2/2020  [email protected]     $501.00  $500.00 $1.00   
10/2/2020  [email protected]     $100.00  $100.00 $0.00   
10/2/2020  [email protected]    $50.00   $50.00  $0.00   
10/2/2020  [email protected] $32.00   $30.00  $2.00   
10/2/2020  [email protected]     $501.00  $500.00 $1.00   
10/3/2020  [email protected]    $45.00   $40.00  $5.00   
10/3/2020  [email protected]     $75.00   $70.00  $5.00   
10/3/2020  [email protected] $100.00  $100.00 $0.00   
10/4/2020  [email protected] $200.00  $200.00 $0.00   

I want to create new columns as a counter to mark if it is a smooth number that ends in a around number - EX. 10, 500 it will be counted and put in a column called checked. Then the left over remaining sales will be put in a column called left? and a counter to indicate it was counted or not.

I am new to manipulating data in python so I'm unsure what I can even google to do this.

Any help would be appreciated or if you guys have suggestions as well!

Thanks!

Upvotes: 1

Views: 58

Answers (2)

Scott Boston
Scott Boston

Reputation: 153500

Try this:

df['SalesNum'] = df['SALE'].str.strip('$').astype(float)
df['MARKED'] =  (df['SalesNum'] // 10) * 10
df['LEFT$'] = df['SalesNum'] % df['MARKED']
df['COUNTED'] = (df['MARKED'] > 0).astype(int)
df

Output:

        DATE            EMAILID       SALE  SalesNum  MARKED  LEFT$  COUNTED
0  10/2/2020      [email protected]  $  501.00     501.0   500.0    1.0        1
1  10/2/2020      [email protected]  $  100.00     100.0   100.0    0.0        1
2  10/2/2020     [email protected]   $  50.00      50.0    50.0    0.0        1
3  10/2/2020  [email protected]   $  32.00      32.0    30.0    2.0        1
4  10/2/2020      [email protected]  $  501.00     501.0   500.0    1.0        1
5  10/3/2020     [email protected]   $  45.00      45.0    40.0    5.0        1
6  10/3/2020      [email protected]   $  75.00      75.0    70.0    5.0        1
7  10/3/2020  [email protected]  $  100.00     100.0   100.0    0.0        1
8  10/4/2020  [email protected]  $  200.00     200.0   200.0    0.0        1

Details:

  • First let's strip the string in 'SALES' column of the dollar sign using the .str, the string accessor and strip and casting that result as a float datatype and assign this to 'SalesNum' column.
  • Next, we can use // the floor divide to get an integer for the number of times 10 will divide evenly in to the 'SalesNum' column. And we multiple that result by 10 to get 'MARKED'.
  • Now, we can use % the modulo operator to get the remainder after we divide 'SalesNum' by 'MARKED'.
  • Lastly, we assign 1 to counted if any value greater than zero was assigned to 'MARKED'.

Upvotes: 2

David Erickson
David Erickson

Reputation: 16683

You could convert to string and do some string manipulation replacing the last digit with 0 and then convert back to int"

  1. I created a series s that essentially counts how many digits there are, which you will use in the next step t oadd that number of zeros.
  2. When creating the MARKED column, I took the first digit with .str[0] and added the required number of zeros from s.

# df['SALE'] = df['SALE'].replace(['\$', '\..*'], '', regex=True).str.strip().astype(int)

s = (df['SALE'].astype(str).map(len) - 1).apply(lambda x: x * '0')
df['MARKED'] = (df['SALE'].astype(str).str[0]  + s).astype(int)
df['LEFT$'] = df['SALE'] - df['MARKED']
df
Out[1]: 
        DATE            EMAILID  SALE  MARKED  LEFT$
0  10/2/2020      [email protected]   501     500      1
1  10/2/2020      [email protected]   100     100      0
2  10/2/2020     [email protected]    50      50      0
3  10/2/2020  [email protected]    32      30      2
4  10/2/2020      [email protected]   501     500      1
5  10/3/2020     [email protected]    45      40      5
6  10/3/2020      [email protected]    75      70      5
7  10/3/2020  [email protected]   100     100      0
8  10/4/2020  [email protected]   200     200      0

Upvotes: 0

Related Questions