Dustin Yates
Dustin Yates

Reputation: 1

How do you strip out only the integers of a column in pandas?

I am trying to strip out only the numeric values--which is the first 1 or 2 digits. Some values in the column contain pure strings and others contain special characters. See pic for the value count:

enter image description here

I have tried multiple methods:

breaks['_Size'] = breaks['Size'].fillna(0)
breaks[breaks['_Size'].astype(str).str.isdigit()]
breaks['_Size'] = breaks['_Size'].replace('\*','',regex=True).astype(float)
breaks['_Size'] = breaks['_Size'].str.extract('(\d+)').astype(int)
breaks['_Size'].map(lambda x: x.rstrip('aAbBcC'))

None are working. The dtype is object. To be clear, I am attempting to make a new column with only the digits (as an int/float) and if I could convert the fraction to a decimal that would be bonus

Upvotes: 0

Views: 71

Answers (1)

Gabriela Melo
Gabriela Melo

Reputation: 619

This works for dividing fractions and also allows for extra numbers to be present in the string (it returns you just the first sequence of numbers):

In [60]: import pandas as pd                                                                                                                                                                                  

In [61]: import re                                                                                                                                                                                            

In [62]: df = pd.DataFrame([0, "6''", '7"', '8in', 'text', '3/4"', '1a3'], columns=['_Size'])                                                                                                                 

In [63]: df                                                                                                                                                                                                   
Out[63]: 
  _Size
0     0
1   6''
2    7"
3   8in
4  text
5  3/4"
6   1a3

In [64]: def cleaning_function(row): 
    ...:     row = str(row) 
    ...:     fractions = re.findall(r'(\d+)/(\d+)', row) 
    ...:     if fractions: 
    ...:         return float(int(fractions[0][0])/int(fractions[0][1])) 
    ...:     numbers = re.findall(r'[0-9]+', str(row)) 
    ...:     if numbers: 
    ...:         return numbers[0] 
    ...:     return 0 
    ...:                                                                                                                                                                                                      

In [65]: df._Size.apply(cleaning_function)                                                                                                                                                                    
Out[65]: 
0       0
1       6
2       7
3       8
4       0
5    0.75
6       1
Name: _Size, dtype: object

Upvotes: 1

Related Questions