Reputation: 389
I have a dataframe named "sample" which has three columns: "birthDay", "birthMonth" and "birthYear" and containing float values as in following picture:
I want to add new column "dateOfBirth" and to have entries in integer format and to obtain following data frame:
I tried sample["dateOfBirth"] = sample["birthDay"].map(str). +"/"+ baseball["birthMonth"].map(str) +"/"+ baseball["birthYear"].map(str)
. But the result was as "11.0/3.0/1988.0"
and "4.0/20.0/2001.0"
.
I would appreciate your help.
Upvotes: 3
Views: 1686
Reputation: 294258
setup
sample = pd.DataFrame([
[3., 11., 1988.],
[20., 4., 2001.],
], columns=['birthDay', 'birthMonth', 'birthYear'])
option 1
make dateOfBirth
a series of Timestamps
# dictionary map to rename to canonical date names
# enables convenient conversion using pd.to_datetime
m = dict(birthDay='Day', birthMonth='Month', birthYear='Year')
sample['dateOfBirth'] = pd.to_datetime(sample.rename(columns=m))
sample
option 2
If you insist on a string
use the dt
accessor with strftime
# dictionary map to rename to canonical date names
# enables convenient conversion using pd.to_datetime
m = dict(birthDay='Day', birthMonth='Month', birthYear='Year')
sample['dateOfBirth'] = pd.to_datetime(sample.rename(columns=m)) \
.dt.strftime('%-m/%-d/%Y')
sample
option 3
If you really want to reconstruct from the values
using apply
f = '{birthMonth:0.0f}/{birthDay:0.0f}/{birthYear:0.0f}'.format
sample['dateOfBirth'] = sample.apply(lambda x: f(**x), 1)
sample
nulls
In the event that one or more of the date columns has a missing value:
Options 1 and 2 don't require any changes and are the recommended options anyway.
If you want to construct from floats, we can use a boolean mask and loc
to assign.
sample = pd.DataFrame([
[3., 11., 1988.],
[20., 4., 2001.],
[20., np.nan, 2001.],
], columns=['birthDay', 'birthMonth', 'birthYear'])
sample
f = '{birthMonth:0.0f}/{birthDay:0.0f}/{birthYear:0.0f}'.format
mask = sample[['birthDay', 'birthMonth', 'birthYear']].notnull().all(1)
sample.loc[mask, 'dateOfBirth'] = sample.apply(lambda x: f(**x), 1)
sample
timing
given sample times 10,000
Upvotes: 2
Reputation: 61967
Before you begin your string concatenation convert all columns to int and then to str.
df = df.astype(int).astype(str)
df['dateOfBirth'] = df['birthMonth'] + '/' + df['birthDay'] + '/' + df['birthYear']
Upvotes: 1