navige
navige

Reputation: 2517

Multindex Join, Column Names

I have a DataFrame df1 with columns logfile, pos, category, value.

Example:

df1 = pd.DataFrame({'logfile': ['log1.log', 'log1.log', 'log1.log', 'log2.log'],
                    'pos': [1, 2, 3, 1],
                    'category': ['cat1', 'cat2', 'cat1', 'cat2'],
                    'value' : [0.1, 0.2, 0.3, 0.4]})

I compute on df1 a count operation for each logfile, position and category:

res1 = df1.groupby(['logfile', 'pos', 'category']).agg(['count'])

I would like to join the resulting DataFrame res to another DataFrame df2 which has columns logfile, pos, calc.

Example:

df2 =  pd.DataFrame({'logfile': ['log1.log','log2.log'],
                'pos': [1, 2],
                'calc': [0.7, 0.9]})

Following what I found in similar questions, I do:

res1.reset_index().merge(df2, on=['logfile', 'pos'])

However, now I get as a result a DataFrame with columns logfile, pos, (logfile,), (pos,), (category,), (value,count), calc. I somehow cannot access the columns by reasonable names.

If I, instead, remove reset_index before merging, the category column is gone.

What I would like to have is a DataFrame with normal column names, but without loosing the column category.

I have Pandas in version 0.25 using in Jupyter Notebook.

Upvotes: 1

Views: 48

Answers (2)

piRSquared
piRSquared

Reputation: 294358

When you did your first groupby, you didn't focus on the value column and you aggregated with a list. This left you with a MultiIndex column object.

res1

                      value
                      count
logfile  pos category      
log1.log 1   cat1         1
         2   cat2         1
         3   cat1         1
log2.log 1   cat2         1

You could have

df1.groupby(['logfile', 'pos', 'category']).agg('count')
# df1.groupby(['logfile', 'pos', 'category']).count()

                       value
logfile  pos category       
log1.log 1   cat1          1
         2   cat2          1
         3   cat1          1
log2.log 1   cat2          1

OR

df1.groupby(['logfile', 'pos', 'category']).value.agg(['count'])

                       count
logfile  pos category       
log1.log 1   cat1          1
         2   cat2          1
         3   cat1          1
log2.log 1   cat2          1

This would leave you with a "Single" Index object

res1 = df1.groupby(['logfile', 'pos', 'category']).value.agg(['count'])
res1.merge(df2, on=['logfile', 'pos'])

    logfile  pos  count  calc
0  log1.log    1      1   0.7

OR

res1 = df1.groupby(['logfile', 'pos', 'category']).count()
res1.merge(df2, on=['logfile', 'pos'])

    logfile  pos  value  calc
0  log1.log    1      1   0.7

Upvotes: 2

iDrwish
iDrwish

Reputation: 3103

You can simply use as_index=False in the groupby operation.

res1 = df1.groupby(['logfile', 'pos', 'category'], as_index=False).agg('count')
res1.merge(df2, on=['logfile', 'pos'], suffixes=('_old', '_new'))

Update With your provided examples:

df1.groupby(['logfile', 'pos', 'category'], as_index=False).agg('count').merge(df2, on=['logfile', 'pos'])

returns

    logfile    pos  category    value   calc
0   log1.log    1   cat1           1    0.7

Upvotes: 2

Related Questions