Quipz
Quipz

Reputation: 71

Regroup the data in one dataframe using the information from another dataframe

I have the following two dataframes:

import pandas as pd
from datetime import datetime

df_two = pd.DataFrame({'A': ['Jan', 'Jan', 'Feb', 'Feb', 'March', 'March'],
                       'B': ['Feb', 'Feb', 'March', 'March', 'Jan', 'Jan'],
                       'C': ['March', 'March', 'Jan', 'Jan', 'Feb', 'Feb']},
                      index=[datetime(2020, 1, 1), datetime(2020, 1, 2),
                             datetime(2020, 1, 3), datetime(2020, 1, 4),
                             datetime(2020, 1, 5), datetime(2020, 1,6)] )

df_one = pd.DataFrame({'series_one': [2, 5, 16, 17, 88, 89],
                       'series_two': [12, 15, 86, 87, 8, 9],
                       'series_three': [82, 85, 6, 7, 18, 19]},
                      index=[datetime(2020, 1, 1), datetime(2020, 1, 2),
                             datetime(2020, 1, 3), datetime(2020, 1, 4),
                             datetime(2020, 1, 5), datetime(2020, 1, 6)])

which look the like this:

df_one
            series_one  series_two  series_three
2020-01-01           2          12            82
2020-01-02           5          15            85
2020-01-03          16          86             6
2020-01-04          17          87             7
2020-01-05          88           8            18
2020-01-06          89           9            19

and

df_two
                A      B      C
2020-01-01    Jan    Feb  March
2020-01-02    Jan    Feb  March
2020-01-03    Feb  March    Jan
2020-01-04    Feb  March    Jan
2020-01-05  March    Jan    Feb
2020-01-06  March    Jan    Feb

I would like to use the information of the two dataframes and get the following view:

            Jan  Feb  March
2020-01-01    2   12     82
2020-01-02    5   15     85
2020-01-03    6   16     86
2020-01-04    7   17     87
2020-01-05   88   18     88
2020-01-06   89   19     89

How do I do this?

Upvotes: 3

Views: 125

Answers (3)

Pygirl
Pygirl

Reputation: 13349

you can try using apply:

def func(row):
    y = df_one.loc[row.name]
    u = (dict(zip(row.values,y.values)))
    return u
res = pd.DataFrame(list(df_two.apply(func,axis=1)), index=df_two.index)

res:

Jan Feb March
2020-01-01 2 12 82
2020-01-02 5 15 85
2020-01-03 6 16 86
2020-01-04 7 17 87
2020-01-05 8 18 88
2020-01-06 9 19 89

map months (Jan, Feb, March) with values

mapping will look like this:

{'Jan': 2, 'Feb': 12, 'March': 82}
{'Jan': 5, 'Feb': 15, 'March': 85}
{'Feb': 16, 'March': 86, 'Jan': 6}
{'Feb': 17, 'March': 87, 'Jan': 7}
{'March': 88, 'Jan': 8, 'Feb': 18}
{'March': 89, 'Jan': 9, 'Feb': 19}

using that you can create a dataframe.

Assuming both of them are having same indices and same size.

One liner solution:

res = pd.DataFrame([dict(zip(df_two.to_numpy()[idx], l)) for idx,l in enumerate(df_one.to_numpy())], index=df_one.index) 

If you want to work on common indices.

i = df_one.index.intersection(df_two.index)
res = pd.DataFrame([dict(zip(df_two.loc[i].to_numpy()[idx], l)) for idx,l in enumerate(df_one.loc[i].to_numpy())], index=i)

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153510

Let's use some pandas reshaping and concatenating methods:

df_one = df_one.rename(columns=dict(zip(df_one.columns, df_two.columns)))
df_out = df_one.stack().reset_index(name='value').merge(df_two.stack().reset_index(name='month'))
df_out = df_out.pivot('level_0', 'month', 'value').rename_axis(index=None, columns=None)
print(df_out)

Output:

            Feb  Jan  March
2020-01-01   12    2     82
2020-01-02   15    5     85
2020-01-03   16    6     86
2020-01-04   17    7     87
2020-01-05   18    8     88
2020-01-06   19    9     89

Details:

  • First, lets get the column headers to match using rename, dict, and zip
  • Stack each dataframe and reset_index creating common column headers to merge on
  • Use pivot to reshape resulting dataframe

Upvotes: 2

Let's try
Let's try

Reputation: 1058

It probably won't solve your problem if it is much more complicated, but it could be a starting point:

pd.DataFrame(list(df_one.apply(lambda x: [x[list(df_two.loc[x.name]=="Jan").index(True)],
                        x[list(df_two.loc[x.name]=="Feb").index(True)],
                        x[list(df_two.loc[x.name]=="March").index(True)]],axis = 1)),
             index = df_one.index,
             columns = ["Jan","Feb", "March"])

Output:

            Jan Feb March
2020-01-01  2   12  82
2020-01-02  5   15  85
2020-01-03  6   16  86
2020-01-04  7   17  87
2020-01-05  8   18  88
2020-01-06  9   19  89

Upvotes: 1

Related Questions