Is there a more optimal way to apply these function to a dataframe

Question

I need to apply a function to two columns in a dataframe.
The idea of the function is to split the value on each row of that column and then turn the split values into ints.

There are two types of values:

Dates as strings (e.g "20.11.2020")
Lists of numbers as strings (e.g "20,11,49,19,2")

The current way I achieve this is by doing:

def numerize_c(row):
    """
    Delim is colon
    """
    return [int(num) for num in row.split(",")]
    
def numerize_d(row):
    """
    Delim is dot
    """
    return [int(num) for num in row.split(".")]

data["corr_num"] = data["corr_num"].apply(numerize_c)
data["game_date"] = data["game_date"].apply(numerize_d)

I feel like this is a terribly inefficient way to do this. Is there a way, to for example give the functions an arg for the delimiters.

Or is there a way to format this into a lambda?

SCKU · Accepted Answer

You could use pd.DataFrame.apply, pd.Series.str.split with regular expressions to split '.' or ',' all at once.

data.loc[:, ["corr_num", "game_date"]] =\
     data[["corr_num", "game_date"]].apply(lambda x: x.str.split(r',|\.'))

Is there a more optimal way to apply these function to a dataframe

Answers (2)

Related Questions