Reputation: 10051
How could I extract area
values from address
column in the follow dataframe?
address quantity price
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 2 20
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡ 3 13
2 606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616 5 23
3 Ap #867-859 Sit Rd. Azusa New York 39 square metre 3 32
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392 5 45
Please note it's the values before of either ㎡
or square metre
.
The desired output will like this:
address area quantity price
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 96.5 2 20
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡ 206.0 3 13
2 606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616 115.0 5 23
3 Ap #867-859 Sit Rd. Azusa New York 39 square metre 39.0 3 32
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392 470.0 5 45
Upvotes: 0
Views: 200
Reputation: 82785
Use str.extract
Ex:
df = pd.DataFrame({'address': ['711-2880 Nulla St. Mankato Mississippi 96.5㎡', 'P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡', '606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616', 'Ap #867-859 Sit Rd. Azusa New York 39 square metre', '7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392']})
df['area'] = df['address'].str.extract(r"(\d+\.?\d*)\s*(?=㎡|\bsquare metre\b)")
print(df)
Output:
address area
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 96.5
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska... 206
2 606-3727 Ullamcorper. Street Roseville NH 115㎡... 115
3 Ap #867-859 Sit Rd. Azusa New York 39 square m... 39
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492... 470
Upvotes: 1