Reputation: 457
I have two data frames in Python like the following
df1
CUSTOMER_KEY LAST_NAME FIRST_NAME
30 f2b6769129 97bb97bebc
46 ca0464878d e276539bc2
51 62f2905a7a 8dfabd6d61
57 21032ca3bc 1f7e5e0c6e
62 f7e7fdd8ce eb6cf4af99
64 f536998bbb 7fc39eacd1
80 6069198f63 d873a71620
99 0ba61a6f66 a6cf7af3eb
102 e8b579b776 c8048fd459
df2
CUSTOMER_KEY LAST_NAME FIRST_NAME
30 Arthur Anderson
46 Teresa Johns
51 Louise Hurwitz
57 Timothy Addy
62 Jeffery Wilson
64 Andres Tuller
80 Daniel Green
99 Frank Nader
102 Faith Young
I want to join these two data frames on Customer_key
(which i can do in Merge) and later concatenate on few columns from the data frame to form a new string in the result data frame. From the below dataframes the result that i am looking is as follows
result_df
CUSTOMER_KEY LAST_NAME FIRST_NAME
30 Artf2b676 And97bb97
46 Terca0464 Johe27653
Basically, substring(last_name,1,4) in df2 and substring(last_name,1,6) in df1 and concatenate those into the new column. Similarly other columns.
How can i achieve this please.
Thanks and Regards
Bala
Upvotes: 0
Views: 2025
Reputation: 323316
Using str
df2['LAST_NAME']=df2['LAST_NAME'].str[:3]+df1['LAST_NAME'].str[:6]
df2['FIRST_NAME']=df2['FIRST_NAME'].str[:3]+df1['FIRST_NAME'].str[:6]
df2
Out[768]:
CUSTOMER_KEY LAST_NAME FIRST_NAME
0 30 Artf2b676 And97bb97
1 46 Terca0464 Johe27653
2 51 Lou62f290 Hur8dfabd
3 57 Tim21032c Add1f7e5e
4 62 Jeff7e7fd Wileb6cf4
5 64 Andf53699 Tul7fc39e
6 80 Dan606919 Gred873a7
7 99 Fra0ba61a Nada6cf7a
8 102 Faie8b579 Youc8048f
If you need merge .
result=df1.merge(df2,on=['CUSTOMER_KEY'])
Upvotes: 3
Reputation: 637
Using merge + str
import pandas as pd
df = pd.DataFrame([
['30','f2b6769129','97bb97bebc'],
['46','ca0464878d','e276539bc2'],
['51','62f2905a7a','8dfabd6d61'],
['57','21032ca3bc','1f7e5e0c6e'],
['62','f7e7fdd8ce','eb6cf4af99'],
['64','f536998bbb','7fc39eacd1'],
['80','6069198f63','d873a71620'],
['99','0ba61a6f66','a6cf7af3eb'],
['102','e8b579b776','c8048fd459']]
)
df2 = pd.DataFrame([
['30','Arthur','Anderson'],
['46','Teresa','Johns'],
['51','Louise','Hurwitz'],
['57','Timothy','Addy'],
['62','Jeffery','Wilson'],
['64','Andres','Tuller'],
['80','Daniel','Green'],
['99','Frank','Nader'],
['102','Faith','Young']]
)
keys = ['CUSTOMER_KEY','LAST_NAME','FIRST_NAME']
df.columns = keys
df2.columns = keys
df_join = pd.merge(df, df2, on="CUSTOMER_KEY", suffixes=['_1', '_2'])
df_join['LAST_NAME'] = df_join['LAST_NAME_2'].str.slice(0,3)+df_join['LAST_NAME_1'].str.slice(0,5)
df_join['FIRST_NAME'] = df_join['FIRST_NAME_2'].str.slice(0,3)+df_join['FIRST_NAME_1'].str.slice(0,5)
result_df = df_join[keys]
result_df.head()
Upvotes: 1