Brad Johnson
Brad Johnson

Reputation: 53

Splitting a mixed number string from a dataframe column and converting it to a float

I have a dataframe with a column of strings that are a mix of whole numbers and mixed fractions. I would like to convert column 'y' to floats.

x            y         z
0            4      Info
1        8 1/2      Info
2          3/4      Info
3           10      Info
4            4      Info
5        6 1/4      Info

The logic I am considering is to split column 'y' by ' ' and '/' to create three separate columns that would look like this.

x         base        b        c         z
0            4        0        0      Info
1            8        1        2      Info
2            0        3        4      Info
3           10        0        0      Info
4            4        0        0      Info
5            6        1        4      Info

From here I could

def convertReplace(df):
    convert = lambda x: float(x)
    df['base'].apply(convert)
    df['b'].apply(convert)
    df['c'].apply(convert)
    decimal = lambda x,y: x/y        
    try:
        df['d'] = decimal(df['b'],df['c'])
        df['y'] = df['base'] + df['d']
    except:
        df['y'] = df['base']
    return df

This might work but I can't get the column to split using the method found here.

df = pd.DataFrame(df.y.str.split(' ',1).str.split('/',1).tolist(),columns = ['base','b','c'])

The error says it expects 3 arguments each time when it may be 1, 2, or 3. Even this thread doesn't use multiple separators.

The actual dataframe has over 400k rows. Efficiency would be great but I'm more interested in just getting it done. Is this logic correct or is there a more concise way to do this? Any help is appreciated.

Upvotes: 2

Views: 953

Answers (1)

vk1011
vk1011

Reputation: 7179

You could try the fractions module. Here's a one-liner:

import fractions
df['y_float'] = df['y'].apply(lambda frac: float(sum([fractions.Fraction(x) for x in frac.split()])))

This gives:

       y     z  y_float
0      4  Info     4.00
1  8 1/2  Info     8.50
2    3/4  Info     0.75
3     10  Info    10.00
4      4  Info     4.00
5  6 1/4  Info     6.25

[EDIT] Corrected version accounting for negative fractions, as well as invalid text:

I realized the above approach would not work for negative fractions, so here is that taken in to account. As it turns out, a one-liner for this would be very tricky!

def get_sign(num_str):
    """
    Verify the sign of the fraction
    """
    return 1-2*num_str.startswith('-')

def is_valid_fraction(text_str):
    """
    Check if the string provided is a valid fraction.
    Here I just used a quick example to check for something of the form of the fraction you have. For something more robust based on what your data can potentially contain, a regex approach would be better.
    """
    return text_str.replace(' ', '').replace('-', '').replace('/', '').isdigit()

def convert_to_float(text_str):
    """
    Convert an incoming string to a float if it is a fraction
    """
     if is_valid_fraction(text_str):
         sgn = get_sign(text_str)
         return sgn*float(sum([abs(fractions.Fraction(x)) for x in text_str.split()]))
     else:
         return pd.np.nan # Insert a NaN if it is invalid text

So now you will have this:

>>> df['y_float'] = df['y'].apply(lambda frac: convert_to_float(frac))
>>> df
              y     z  y_float
0             4  Info     4.00
1         8 1/2  Info     8.50
2           3/4  Info     0.75
3            10  Info    10.00
4             0  Info     0.00
5         6 1/4  Info     6.25
6        -3 2/5  Info    -3.40
7          -4/5  Info    -0.80
8  gibberish100  Info      NaN

Upvotes: 2

Related Questions