Reputation: 633
I have two data frames and I want to match the entries in column1 of df1 with the entries in column 1 and 3 of df2. If all three entries match, print lines from both dataframes in the output file. For example:
df1:
a21 1 2 3
a32 4 5 6
a43 7 8 9
df2:
b21 10 21
b21 10 22
b43 10 43
output:
21 1 2 3 10
43 7 8 9 10
I'm currently doing the following and using str.extract
to get the integers:
import pandas as pd
import re
df1 = pd.read_table('data1.txt', delim_whitespace= True)
df1.columns = ['1','2','3','4']
num = df1['1'].str.extract('(\d+)').astype(int)
df2 = pd.read_table('data2.txt', delim_whitespace= True)
df2.columns = ['1','2','3']
num2 = df2['1'].str.extract('(\d+)').astype(int)
both = [df1, df2]
if num == num2:
if num == df2['3']:
result = pd.concat(both, axis=1)
print(result)
I'm not sure how to match the three values and concatenate the dataframes. Should I create dictionaries instead? Can someone point me in the right direction?
Upvotes: 0
Views: 54
Reputation: 164663
This is one way.
import pandas as pd
df1 = pd.DataFrame([['a21', 1, 2, 3],
['a32', 4, 5, 6],
['a43', 7, 8, 9]],
columns=['1', '2', '3', '4'])
df2 = pd.DataFrame([['b21', 10, 21],
['b21', 10, 22],
['b43', 10, 43]],
columns=['1', '2', '3'])
df1.index = df1['1'].apply(lambda x: int(x[1:]))
df2.index = df2['1'].apply(lambda x: int(x[1:]))
# filter df2 for where index = column 3
df3 = df2[df2.index == df2['3']]
# join onto df1
df4 = df1.merge(df3, left_index=True, right_index=True).drop(['1_x', '1_y'], 1)
# 2_x 3_x 4 2_y 3_y
# 1
# 21 1 2 3 10 21
# 43 7 8 9 10 43
Upvotes: 1