Reputation: 45
I’ve got two data frames :-
Df1
Time V1 V2
02:00 D3F3 0041
02:01 DD34 0040
Df2
FileName V1 V2
1111.txt D3F3 0041
2222.txt 0000 0040
Basically I want to compare the v1 v2 columns and if they match print the row time from df1 and the row from df2 filename. So far all i can find is the
isin()
, which simply gives you a boolean output.
So the output would be :
1111.txt 02:00
I started using dataframes because i though i could query the two df's on the V1 / V2 values but I can't see a way. Any pointers would be much appreciated
Upvotes: 2
Views: 5124
Reputation: 17
Try this:
client = boto3.client('s3')
obj = client.get_object(Bucket='', Key='')
data = obj['Body'].read()
df1 = pd.read_excel(io.BytesIO(data), sheet_name='0')
df2 = pd.read_excel(io.BytesIO(data), sheet_name='1')
head = df2.columns[0]
print(head)
data = df1.iloc[[8],[0]].values[0]
print(data)
print(df2)
df2.columns = df2.iloc[0]
df2 = df2.drop(labels=0, axis=0)
df2['Head'] = head
df2['ID'] = pd.Series([data,data])
print(df2)
df2.to_csv('test.csv',index=False)
Upvotes: 0
Reputation: 11
The most intuitive solution is: 1) iterate the V1 column in DF1; 2) for each item in this column, check if this item exists in the V1 column of DF2; 3) if the item exists in DF2's V1, then find the index of that item in the DF2 and then you would be able to find the file name.
Upvotes: 1
Reputation: 409
While I think Eric's solution is more pythonic, if your only aim is to print the rows on which df1 and df2 have v1 and v2 values the same, provided the two dataframes are of the same length, you can do the following:
for row in range(len(df1)):
if (df1.iloc[row,1:] == df2.iloc[row,1:]).all() == True:
print(df1.iloc[row], df2.iloc[row])
Upvotes: 0
Reputation: 337
You can use merge option with inner join
df2.merge(df1,how="inner",on=["V1","V2"])[["FileName","Time"]]
Upvotes: 0
Reputation: 1
You can try using pd.concat
.
On this case it would be like:
pd.concat([df1, df2.reindex(df1.index)], axis=1)
It will create a new dataframe with all the values, but in case there are some values that doesn't match in both dataframes, it'll return NaN
. If you doesn't want this to happen you must use this:
pd.concat([df1, df4], axis=1, join='inner')
If you wanna learn a bit more, use pydata: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
Upvotes: 0
Reputation: 3010
Use merge
on the dataframe columns that you want to have the same values. You can then drop the rows with NaN values, as those will not have matching values. From there, you can print the merged dataframes values however you see fit.
df1 = pd.DataFrame({'Time': ['8a', '10p'], 'V1': [1, 2], 'V2': [3, 4]})
df2 = pd.DataFrame({'fn': ['8.txt', '10.txt'], 'V1': [3, 2], 'V2': [3, 4]})
df1.merge(df2, on=['V1', 'V2'], how='outer').dropna()
=== Output: ===
Time V1 V2 fn
1 10p 2 4 10.txt
Upvotes: 2