Reputation: 345
I need to apply a function to two columns in a dataframe.
The idea of the function is to split the value on each row of that column and then turn the split values into ints.
There are two types of values:
"20.11.2020"
)"20,11,49,19,2"
)The current way I achieve this is by doing:
def numerize_c(row):
"""
Delim is colon
"""
return [int(num) for num in row.split(",")]
def numerize_d(row):
"""
Delim is dot
"""
return [int(num) for num in row.split(".")]
data["corr_num"] = data["corr_num"].apply(numerize_c)
data["game_date"] = data["game_date"].apply(numerize_d)
I feel like this is a terribly inefficient way to do this. Is there a way, to for example give the functions an arg for the delimiters.
Or is there a way to format this into a lambda?
Upvotes: 0
Views: 46
Reputation: 833
You could use pd.DataFrame.apply
, pd.Series.str.split
with regular expressions to split '.'
or ','
all at once.
data.loc[:, ["corr_num", "game_date"]] =\
data[["corr_num", "game_date"]].apply(lambda x: x.str.split(r',|\.'))
Upvotes: 1
Reputation: 106
An improvement would be to use data['corr_num'].str.split(',')
. This built-in is much faster than apply.
Upvotes: 1