Natig Aliyev
Natig Aliyev

Reputation: 389

Change float values into integer values and then concatenate in pandas dataframe

I have a dataframe named "sample" which has three columns: "birthDay", "birthMonth" and "birthYear" and containing float values as in following picture:

enter image description here

I want to add new column "dateOfBirth" and to have entries in integer format and to obtain following data frame:

enter image description here

I tried sample["dateOfBirth"] = sample["birthDay"].map(str). +"/"+ baseball["birthMonth"].map(str) +"/"+ baseball["birthYear"].map(str). But the result was as "11.0/3.0/1988.0" and "4.0/20.0/2001.0".

I would appreciate your help.

Upvotes: 3

Views: 1686

Answers (2)

piRSquared
piRSquared

Reputation: 294258

setup

sample = pd.DataFrame([
        [3., 11., 1988.],
        [20., 4., 2001.],
    ], columns=['birthDay', 'birthMonth', 'birthYear'])

option 1
make dateOfBirth a series of Timestamps

# dictionary map to rename to canonical date names
# enables convenient conversion using pd.to_datetime
m = dict(birthDay='Day', birthMonth='Month', birthYear='Year')
sample['dateOfBirth'] = pd.to_datetime(sample.rename(columns=m))

sample

enter image description here


option 2
If you insist on a string
use the dt accessor with strftime

# dictionary map to rename to canonical date names
# enables convenient conversion using pd.to_datetime
m = dict(birthDay='Day', birthMonth='Month', birthYear='Year')

sample['dateOfBirth'] = pd.to_datetime(sample.rename(columns=m)) \
                          .dt.strftime('%-m/%-d/%Y')

sample

enter image description here


option 3
If you really want to reconstruct from the values
using apply

f = '{birthMonth:0.0f}/{birthDay:0.0f}/{birthYear:0.0f}'.format
sample['dateOfBirth'] = sample.apply(lambda x: f(**x), 1)
sample

enter image description here


nulls
In the event that one or more of the date columns has a missing value:
Options 1 and 2 don't require any changes and are the recommended options anyway.
If you want to construct from floats, we can use a boolean mask and loc to assign.

sample = pd.DataFrame([
        [3., 11., 1988.],
        [20., 4., 2001.],
        [20., np.nan, 2001.],
    ], columns=['birthDay', 'birthMonth', 'birthYear'])

sample

enter image description here

f = '{birthMonth:0.0f}/{birthDay:0.0f}/{birthYear:0.0f}'.format
mask = sample[['birthDay', 'birthMonth', 'birthYear']].notnull().all(1)
sample.loc[mask, 'dateOfBirth'] = sample.apply(lambda x: f(**x), 1)
sample

enter image description here


timing
given sample
enter image description here

timing
given sample times 10,000
enter image description here

Upvotes: 2

Ted Petrou
Ted Petrou

Reputation: 61967

Before you begin your string concatenation convert all columns to int and then to str.

df = df.astype(int).astype(str)
df['dateOfBirth'] = df['birthMonth'] + '/' + df['birthDay'] + '/' + df['birthYear']

Upvotes: 1

Related Questions