Concatenate the values of two pandas series (that are not of str type) is too slow (linear complexity)

Question

I have a pandas.DataFrame with two (or more) series that are not of type str (type float for instance). The output I want to have is a serie of type str that is the result of the concatenation of my series (of type float) with a given separator (for instance "-").

The following function build_df_ex build the example dataframe:

def build_df_ex(n):
    df_ex = pd.DataFrame({"s1": -abs(np.random.rand(int(n))),
                          "s2": +abs(np.random.rand(int(n)))})
    return df_ex

The function convert_to_str_and_add make the desired concatenation :

def convert_to_str_and_add(df, sep="-"):
    df = df.astype(str)
    s = df.s1 + sep + df.s2
    return s

My main problem is that this function has linear complexity (see the graph below) which is prohibitive in my case. The main bottleneck of the function is the conversion to str type. I have try to go the numpy way but I didn't see any gain in performance, probably because it is what pandas is already doing under the hood.

Anyone has a solution that would make this operation faster ?

Thanks a lot

Concatenate the values of two pandas series (that are not of str type) is too slow (linear complexity)

Answers (1)

Related Questions