Reputation: 682
I have a pandas.DataFrame
with two (or more) series that are not of type str
(type float for instance). The output I want to have is a serie of type str
that is the result of the concatenation of my series (of type float) with a given separator (for instance "-").
The following function build_df_ex
build the example dataframe:
def build_df_ex(n):
df_ex = pd.DataFrame({"s1": -abs(np.random.rand(int(n))),
"s2": +abs(np.random.rand(int(n)))})
return df_ex
The function convert_to_str_and_add
make the desired concatenation :
def convert_to_str_and_add(df, sep="-"):
df = df.astype(str)
s = df.s1 + sep + df.s2
return s
My main problem is that this function has linear complexity (see the graph below) which is prohibitive in my case. The main bottleneck of the function is the conversion to str
type. I have try to go the numpy way but I didn't see any gain in performance, probably because it is what pandas is already doing under the hood.
Anyone has a solution that would make this operation faster ?
Thanks a lot
Upvotes: 0
Views: 49
Reputation: 117771
You can not escape linear performance - your only hope is to show more of what you plan to do with the result to try and avoid extra work. What you have written is perfectly reasonable, you can try the following and see whether it has better performance (but I wouldn't be surprised if it doesn't).
df.apply(('{0[0]}' + sep + '{0[1]}').format, axis=1)
Upvotes: 1