Henry
Henry

Reputation: 77

About python vaex merging columns to a new column while changing int to float

I am able to write a function to merge columns to a new column, but fail to change int column into float before changing to string for merging.

I hope that in the new merged column, those integer would have pending ".00000".

At the end I was trying to make merged column as key for joining two vaex on multiple key/column. As it seems vaex only take one column/key for joining two vaex, I need to make combined column as key.

The changing of int to float is in case that column in one vaex is int and in another vaex is float.

code is as below.

Function new_column_by_column_merging is working, but function new_column_by_column_merging2 is not. Wondering if there is any way to make it work.

import vaex
import pandas as pd  
import numpy as np

def new_column_by_column_merging(df, columns=None):
    if columns is None:
        columns = df.get_column_names()
    if type(columns) is str:
        df['merged_column_key'] = df[columns]
        return df

    df['merged_column_key'] = np.array(['']*len(df))
    for col in columns:
        df['merged_column_key'] = df['merged_column_key'] + '_' + df[col].astype('string')
    return df

def new_column_by_column_merging2(df, columns=None):
    if columns is None:
        columns = df.get_column_names()
    if type(columns) is str:
        df['merged_column_key'] = df[columns]
        return df

    df['merged_column_key'] = np.array(['']*len(df))
    for col in columns:
        try:
            df[col] = df[col].astype('float')
        except:
            print('fail to convert to float')
        df['merged_column_key'] = df['merged_column_key'] + '_' + df[col].astype('string')
    return df


pandas_df = pd.DataFrame({'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Last Name': ['Johnson', 'Cameron', 'Biden', 'Washington'], 'Age': [20, 21, 19, 18], 'Weight': [60.0, 61.0, 62.0, 63.0]})  
print('pandas_df is')
print(pandas_df)  

df = vaex.from_pandas(df=pandas_df, copy_index=False)

df1 = new_column_by_column_merging(df, ['Name', 'Age', 'Weight'])

print('new_column_by_column_merging returns')
print(df1)

df2 = new_column_by_column_merging2(df, ['Name', 'Age', 'Weight'])

print('new_column_by_column_merging2 returns')
print(df2)

Upvotes: 0

Views: 263

Answers (1)

Joco
Joco

Reputation: 813

It looks like the vaex expression system does not always play nicely with the try / except checks. So you need to be careful with the dtypes. One way of handing this:

import vaex

df = vaex.datasets.titanic()  # dataframe for testing

def new_column_by_column_merging2(df, columns=None):
    if columns is None:
        columns = df.get_column_names()
    if type(columns) is str:
        df['merged_column_key'] = df[columns]
        return df

    df['merged_column_key'] = np.array(['']*len(df))
    for col in columns:
        if df[col].is_string():
            pass
        else:
            df[col] = df[col].astype('float')
        df['merged_column_key'] = df['merged_column_key'] + '_' + df[col].astype('string')
    return df


new_column_by_column_merging2(df)   # should work

Basically i modified the try/except statement to explicitly check for strings (since they can't be converted to floats). You might have to extend that check to check for other things like datetime etc.. if needed. Hope this helps

Upvotes: 0

Related Questions