UNagaswamy
UNagaswamy

Reputation: 2150

Pandas dataframe merge

I have a concatenated pandas dataframe from 4 dataframes like this:

In [121]: all
Out[121]:
       E  H  N  S
   102P    Y  NaN  NaN  NaN
   103R    Y  NaN  NaN  NaN
   102P  NaN  NaN    Y  NaN
   103R  NaN  NaN    Y  NaN
   109F  NaN  NaN    Y  NaN
   103R  NaN    Y  NaN  NaN
   109F  NaN    Y  NaN  NaN
   102P  NaN  NaN  NaN    Y
   103R  NaN  NaN  NaN    Y
   109F  NaN  NaN  NaN    Y

I want to consolidate this into a dataframe like this:

        E   H   N   S
  102P  Y  NAN  Y   Y
  103R  Y   Y   Y   Y
  109F NAN  Y   Y   Y

How can I merge them based on all.index?

Upvotes: 0

Views: 865

Answers (1)

Viktor Kerkez
Viktor Kerkez

Reputation: 46636

Do a groupby on the index (I presume from the data you posted that the values 102P... are in the index). And count the values. That will return a DataFrame with zeros and ones. Just replace them with appropriate values.

>>> ndf = df.groupby(level=0).count()
>>> ndf[ndf == 1] = 'Y'
>>> ndf[ndf == 0] = np.nan
>>> ndf
         E    H  N  S
label                
102P     Y  NaN  Y  Y
103R     Y    Y  Y  Y
109F   NaN    Y  Y  Y

If you have repetitions then just change the condition from ndf[ndf == 1] to ndf[ndf > 0].

But why are you concatenating the data frames instead of combining them? Example:

>>> df1
      E   H   N   S
0                  
102P  Y NaN NaN NaN
103R  Y NaN NaN NaN
>>> df2
       E   H  N   S
0                  
102P NaN NaN  Y NaN
103R NaN NaN  Y NaN
109F NaN NaN  Y NaN

...

>>> reduce(lambda first, second: first.combine_first(second),
           [df1, df2, df3, df4], pd.DataFrame())
        E    H  N  S
0                   
102P    Y  NaN  Y  Y
103R    Y    Y  Y  Y
109F  NaN    Y  Y  Y

Upvotes: 1

Related Questions