Reputation:
I have a column of weights (there are actually 5k weights), a small batch looks like this:
weight
15.00 oz
19.00 oz
2 lb 15.00 oz
1 lb 19.00 oz
What I wanted to do is to convert the whole weight to pounds, something like this:
weight
0.9375 lb
1.1875 lb
2.9375 lb
2.1875 lb
What shall I do to achieve this? What I have tried so far is:
df[['lbs','oz']] = df.Weight.str.split("lb",expand=True)
but this doesn't work, as for rows with no 'lb' unit, the code doesn't work. that looked like this:
pounds ounces
15.00 oz
19.00 oz
2lb 15.00oz
1lb 19.00oz
Upvotes: 1
Views: 802
Reputation: 194
This works, but there's almost certainly a neater 'more pandas' way to do it... This should be fast enough to process 5,000 values.
Imports:
import pandas as pd
Test data set-up (including the data with oz values after the .):
df = pd.DataFrame(["15.00 oz",
"19.00 oz",
"2 lb 15.00 oz",
"1 lb 19.00 oz",
"1 lb 12.80 oz",
"1 lb",
"nothing"],
columns=["weight"])
Produces:
weight
0 15.00 oz
1 19.00 oz
2 2 lb 15.00 oz
3 1 lb 19.00 oz
4 1 lb 12.80 oz
5 1 lb
6 nothing
Define a function to map from the individual lb/oz values to a single lb value. This takes an array of tuples, which may be empty, such as: [(,'15.00')]
or []
or [('1', '12.80')]
(The 'numbers' in the matches are still of type str
at this point):
def lbsFromMatchedNumbers(matchData):
if len(matchData) == 0:
return None
(lbs, oz) = matchData[0]
lbs = float(lbs or 0)
oz = float(oz or 0)
ounces_in_pound = 16.0
return lbs + (oz / ounces_in_pound)
Find all the items in the 'weight' row, and then process them with the function, and assign to new 'lb' column:
matchPattern = "^(?:(\d+) lb ?)?(?:(\d+(?:.\d+)?) oz)?$"
df["lb"] = df["weight"].str.findall(matchPattern).apply(lbsFromMatchedNumbers)
Produces:
weight lb
0 15.00 oz 0.9375
1 19.00 oz 1.1875
2 2 lb 15.00 oz 2.9375
3 1 lb 19.00 oz 2.1875
4 1 lb 12.80 oz 1.8000
5 1 lb 1.0000
6 nothing NaN
Note: This works if there are only lb or oz numbers, as shown in extra rows in the sample data I've used. If there's neither, it produces NaN
.
We're using a regex ('regular expression') to match the portions of the 'weight'
text content using this pattern:
"^(?:(\d+) lb ?)?(?:(\d+(?:.\d+)?) oz)?$"
?
(a , then a ?) looks for either a space, or nothing (the ?
makes it optional)(hello)?
looks for 'hello', but continues anyway if it's not found (due to the ?
?:
, (?:like this)
, group items together, but don't save this as one of the 'matched groups'. In our example only the two numbers are returned, when they are matched (as they are enclosed in plain brackets)Putting that together, this regex basically says:
^
).
, followed by 1-or-more 0-9 digits$
)Upvotes: 1
Reputation: 1
weights = [15, 19] #different weights here
for i in weights:
weight = i / 16
print(weight)
Try this, it should be working and it's a far simpler was of doing it that what you showed. Hope it works for you!
Upvotes: 0