sudha11
sudha11

Reputation: 23

Can we sort multiple data frames comparing values of each element in column

I have two csv files having some data and I would like to combine and sort data based on one common column: Here is data1.csv and data2.csv file:

enter image description here

enter image description here

The data3.csv is the output file where you I need data to be combined and sorted as below: enter image description here

How can I achieve this?

Upvotes: 0

Views: 38

Answers (1)

Mark Andersen
Mark Andersen

Reputation: 1028

Here's what I think you want to do here:

I created two dataframes with simple types, assume the first column is like your timestamp:

df1 = pd.DataFrame([[1,1],[2,2], [7,10], [8,15]], columns=['timestamp', 'A'])
df2 = pd.DataFrame([[1,5],[4,7],[6,9], [7,11]], columns=['timestamp', 'B'])
c = df1.merge(df2, how='outer', on='timestamp')
print(c)

The outer merge causes each contributing DataFrame to be fully present in the output even if not matched to the other DataFrame.

The result is that you end up with a DataFrame with a timestamp column and the dependent data from each of the source DataFrames.

Caveats:

  • You have repeating timestamps in your second sample, which I assume may be due to the fact you do not show enough resolution. You would not want true duplicate records for this merge solution, as we assume timestamps are unique.
  • I have not repeated the timestamp column here a second time, but it is easy to add in another timestamp column based on whether column A or B is notnull() if you really need to have two timestamp columns. Pandas merge() has an indicator option which would show you the source of the timestamp if you did not want to rely on columns A and B.
  • In the post you have two output columns named "timestamp". Generally you would not output two columns with same name since they are only distinguished by position (or color) which are not properties you should rely upon.

Upvotes: 1

Related Questions