SCool
SCool

Reputation: 3375

Concatenate multiple column strings into one column

I have the following dataframe with firstname and surname. I want to create a column fullname.

df1 = pd.DataFrame({'firstname':['jack','john','donald'],
                  'lastname':[pd.np.nan,'obrien','trump']})

print(df1)

  firstname lastname
0      jack      NaN
1      john   obrien
2    donald    trump

This works if there are no NaN values:

df1['fullname'] = df1['firstname']+df1['lastname']

However since there are NaNs in my dataframe, I decided to cast to string first. But it causes a problem in the fullname column:

df1['fullname'] = str(df1['firstname'])+str(df1['lastname'])


  firstname lastname                                           fullname
0      jack      NaN  0      jack\n1      john\n2    donald\nName: f...
1      john   obrien  0      jack\n1      john\n2    donald\nName: f...
2    donald    trump  0      jack\n1      john\n2    donald\nName: f...

I can write some function that checks for nans and inserts the data into the new frame, but before I do that - is there another fast method to combine these strings into one column?

Upvotes: 1

Views: 142

Answers (5)

harpan
harpan

Reputation: 8631

You need to treat NaNs using .fillna() Here, you can fill it with '' .

df1['fullname'] = df1['firstname'] + ' ' +df1['lastname'].fillna('')

Output:

 firstname  lastname    fullname
0   jack    NaN         jack
1   john    obrien      john obrien
2   donald  trump       donald trumpt

Upvotes: 3

rafaelc
rafaelc

Reputation: 59264

You may also use .add and specify a fill_value

df1.firstname.add(" ").add(df1.lastname, fill_value="")

PS: Chaining too many adds or + is not recommended for strings, but for one or two columns you should be fine

Upvotes: 1

BENY
BENY

Reputation: 323226

What I will do (For the case more than two columns need to join)

df1.stack().groupby(level=0).agg(' '.join)
Out[57]: 
0            jack
1     john obrien
2    donald trump
dtype: object

Upvotes: 0

Alex
Alex

Reputation: 7045

There is also Series.str.cat which can handle NaN and includes the separator.

df1["fullname"] = df1["firstname"].str.cat(df1["lastname"], sep=" ", na_rep="")

   firstname lastname      fullname
 0      jack      NaN         jack
 1      john   obrien   john obrien
 2    donald    trump  donald trump

Upvotes: 0

Mose Wintner
Mose Wintner

Reputation: 308

df1['fullname'] = df1['firstname']+df1['lastname'].fillna('')

Upvotes: 0

Related Questions