Reputation: 1569
I have a csv file with the following column:
timestamp. message. name. DestinationUsername. sourceUsername
13.05. hello. hello. name1.
13.05. hello. hello. name2. 43565
what I would like to achieve is to merge together DestinationUsername
and SourceUsername
into a new column called ID
What I have done so far is the following
f=pd.read_csv('file.csv')
f['userID'] = f.destinationUserName + f.sourceUserName
keep_col = ['@timestamp', 'message', 'name', 'destinationUserName', 'sourceUserName', 'userID']
new_f = f[keep_col]
new_f.to_csv("newFile.csv", index=False)
But this does not work as expected, because in the output I can see if one of the column destinationUserName
or sourceUsername
is empty, than the userID
is empty, the userId get populated only id both
destinationUserName and sourceUserName are populated already.
Can anyone help me to understand how I can go over this problem please?
And please if you need more infos just ask me
Upvotes: 1
Views: 95
Reputation: 58
combine=lambda x: '{}{}'.format(x.DestinationUsername,x.sourceUsername)
df['ID'] = df.apply(combine, axis = 1)
df.drop(['DestinationUsername','sourceUsername'],axis=1)
Upvotes: 0
Reputation: 24314
you can typecast the column to string and then remove 'nan' by replace()
method:
df['ID']=(df['DestinationUsername'].astype(str)
+
df['sourceUsername'].astype(str).replace('nan','',regex=True))
OR
df['ID']=df[['DestinationUsername','sourceUsername']].astype(str)
.agg(''.join,1)
.replace('nan','',regex=True)
Note: you can also use apply()
in place of agg()
method
output of df['ID']
:
0 name1.
1 name2.43565.0
dtype: object
Upvotes: 1