Reputation: 21
I have the following question I would like to ask:
EXAMPLE:
INITIAL DATA
DATE EMAIL SALE
10/2/2020 [email protected] $ 501.00
10/2/2020 [email protected] $ 100.00
10/2/2020 [email protected] $ 50.00
10/2/2020 [email protected] $ 32.00
10/2/2020 [email protected] $ 501.00
10/3/2020 [email protected] $ 45.00
10/3/2020 [email protected] $ 75.00
10/3/2020 [email protected] $ 100.00
10/4/2020 [email protected] $ 200.00
DESIRED OUTPUT
DATE EMAIL SALE CHECKED LEFT?
10/2/2020 [email protected] $501.00 $500.00 $1.00
10/2/2020 [email protected] $100.00 $100.00 $0.00
10/2/2020 [email protected] $50.00 $50.00 $0.00
10/2/2020 [email protected] $32.00 $30.00 $2.00
10/2/2020 [email protected] $501.00 $500.00 $1.00
10/3/2020 [email protected] $45.00 $40.00 $5.00
10/3/2020 [email protected] $75.00 $70.00 $5.00
10/3/2020 [email protected] $100.00 $100.00 $0.00
10/4/2020 [email protected] $200.00 $200.00 $0.00
I want to create new columns as a counter to mark if it is a smooth number that ends in a around number - EX. 10, 500 it will be counted and put in a column called checked. Then the left over remaining sales will be put in a column called left? and a counter to indicate it was counted or not.
I am new to manipulating data in python so I'm unsure what I can even google to do this.
Any help would be appreciated or if you guys have suggestions as well!
Thanks!
Upvotes: 1
Views: 58
Reputation: 153500
Try this:
df['SalesNum'] = df['SALE'].str.strip('$').astype(float)
df['MARKED'] = (df['SalesNum'] // 10) * 10
df['LEFT$'] = df['SalesNum'] % df['MARKED']
df['COUNTED'] = (df['MARKED'] > 0).astype(int)
df
Output:
DATE EMAILID SALE SalesNum MARKED LEFT$ COUNTED
0 10/2/2020 [email protected] $ 501.00 501.0 500.0 1.0 1
1 10/2/2020 [email protected] $ 100.00 100.0 100.0 0.0 1
2 10/2/2020 [email protected] $ 50.00 50.0 50.0 0.0 1
3 10/2/2020 [email protected] $ 32.00 32.0 30.0 2.0 1
4 10/2/2020 [email protected] $ 501.00 501.0 500.0 1.0 1
5 10/3/2020 [email protected] $ 45.00 45.0 40.0 5.0 1
6 10/3/2020 [email protected] $ 75.00 75.0 70.0 5.0 1
7 10/3/2020 [email protected] $ 100.00 100.0 100.0 0.0 1
8 10/4/2020 [email protected] $ 200.00 200.0 200.0 0.0 1
Details:
.str
, the string accessor and strip
and casting that
result as a float datatype and assign this to 'SalesNum' column.//
the floor divide to get an integer for the
number of times 10 will divide evenly in to the 'SalesNum' column.
And we multiple that result by 10 to get 'MARKED'.%
the modulo operator to get the remainder after we
divide 'SalesNum' by 'MARKED'.Upvotes: 2
Reputation: 16683
You could convert to string and do some string manipulation replacing the last digit with 0 and then convert back to int"
s
that essentially counts how many digits there are, which you will use in the next step t oadd that number of zeros.MARKED
column, I took the first digit with .str[0]
and added the required number of zeros from s
.# df['SALE'] = df['SALE'].replace(['\$', '\..*'], '', regex=True).str.strip().astype(int)
s = (df['SALE'].astype(str).map(len) - 1).apply(lambda x: x * '0')
df['MARKED'] = (df['SALE'].astype(str).str[0] + s).astype(int)
df['LEFT$'] = df['SALE'] - df['MARKED']
df
Out[1]:
DATE EMAILID SALE MARKED LEFT$
0 10/2/2020 [email protected] 501 500 1
1 10/2/2020 [email protected] 100 100 0
2 10/2/2020 [email protected] 50 50 0
3 10/2/2020 [email protected] 32 30 2
4 10/2/2020 [email protected] 501 500 1
5 10/3/2020 [email protected] 45 40 5
6 10/3/2020 [email protected] 75 70 5
7 10/3/2020 [email protected] 100 100 0
8 10/4/2020 [email protected] 200 200 0
Upvotes: 0