How can I merge one dataframe into another, inserting rows that do not exist?

Question

I'm working with an eyetracker that samples at 500 hz, thus I have a measurement every 2ms. I've loaded this data into a dataframe and reindexed it with the time samples, such that my row indexes match the time, in ms, where the recording was made:

Here's a slice of this dataframe (note that the event column contains empty strings right now):

           x       y      cr event
2000   109.9  1133.7  4264.0      
2001   109.0  1133.3  4267.0      
2002   107.1  1130.6  4269.0      
2003   104.3  1128.0  4270.0      
2004    97.8  1125.2  4268.0      
2005    89.8  1124.3  4266.0      
2006    79.2  1121.3  4274.0      
2007    69.5  1118.7  4284.0      
2008    59.6  1116.7  4293.0      
2009    53.0  1116.9  4291.0      
2010    48.6  1117.1  4290.0

As you can see, there are no odd-numbered time indexes, again, because the machine samples every 2 ms.

My second DataFrame tracks TTLs that are sent to the eyetracker. These TTLs can occur at any time with millisecond resolution, so there can be odd-numbered timestamps, here. This second DataFrame has two columns, t and msg. Here's a slice:

     t           msg
0    11911  WarningOnset
1    12510      CueOnset
2    14693  WarningOnset
3    17009  WarningOnset
4    17642      CueOnset

My goal is to merge the second dataframe into the first, such that each item in the msg column of dataframe2 gets placed into the event column of dataframe1 at the index indicated by column t.

This means that some of the indexes I'm targeting in dataframe1 do not yet exist, which is why my current approach is failing with a KeyError.

Here's what I tried:

samples.loc[events['t'], 'event'] = events['msg']  # throws KeyError

What is the correct way of doing this? For clarity, I'd like to either place NaNs in the columns that did not exist in dataframe2 (i.e. events in the example above).

EdChum · Accepted Answer

The reason that your initial attempt fails is that the values don't exist in your target dataframe, what you want to do is assign events where there is a match and set events to NaN or blank or whatever where there are no real events.

What you want to do is merge the other df to your first one on either an index (which should represent the same thing) or columns which should represent the same thing.

You will also need to rename your column in the second df.

So rename the column:

df1.rename({'msg':'event'})

Now set column 't' to be your index

df1.set_index(['t'],inplace=True)

Now merge:

df.merge(df1, left_index=True, right_index=True, how='left')

You can also use combine first which preserves the left hand side dataframes data:

df.combine_first(df1)

How can I merge one dataframe into another, inserting rows that do not exist?

Answers (1)

Related Questions