C-B
C-B

Reputation: 33

How to re-structure this in python pandas? Merge, unstack or what?

Trying to re-structure a data frame with a format like this:

    key   ref name value
0    k1  None   N1     A
1  None    k1   N2     B
2  None    k1   N3     C
3    k2  None   N4     D
4    k3  None   N5     E
5  None    k3   N6     F
6  None    k3   N7     G

# In code
df = pd.DataFrame(columns=['key', 'ref', 'name', 'value'],
                  data=[
                    ['k1',None,'N1','A'],
                    [None,'k1','N2','B'],
                    [None,'k1','N3','C'],
                    ['k2',None,'N4','D'],
                    ['k3',None,'N5','E'],
                    [None,'k3','N6','F'],
                    [None,'k3','N7','G']])

Into this:

  key   ref name value name2 value2 name3 value3
0  k1    k1   N1     A    N2      B    N3      C
1  k2  None   N4     D  None   None  None   None
2  k3    k3   N5     E    N6      F    N7      G

But are struggeling to get it right. 'key' and 'ref' are not indexes above but feel free to elaborate of how to make use of them this way (source is an Excel-file of this format), if that is a part of the solution. The goal is to have names and values mapped accordingly to the example... (key's and ref's will then be discarded)

Tried with merge and stack but can't get it to work properly...

Note to following rules:

In other words:

Upvotes: 3

Views: 257

Answers (1)

divingTobi
divingTobi

Reputation: 2300

Maybe this is what you need:

import pandas as pd

df = pd.DataFrame(
    columns=["key", "ref", "name", "value"],
    data=[
        ["k1", None, "N1", "A"],
        [None, "k1", "N2", "B"],
        [None, "k1", "N3", "C"],
        ["k2", None, "N4", "D"],
        ["k3", None, "N5", "E"],
        [None, "k3", "N6", "F"],
        [None, "k3", "N7", "G"],
    ],
)

print(df)
ind = df["key"].isna()
df1 = df.loc[~ind]
df2 = df.loc[ind]

combo = (
    df1.merge(df2[["ref", "name", "value"]], left_on="key", right_on="ref", how="left")
    .fillna("")
    .groupby("key")
    .agg(name=pd.NamedAgg("name_y", ":".join), value=pd.NamedAgg("value_y", ":".join))
)

for c in ["name", "value"]:
    dx = combo[c].str.split(":", expand=True).add_prefix(c)
    df1 = df1.merge(dx, left_on="key", right_index=True)

print(df1)

First, we split df into two separate dataframes, depending on whether there is an entry in Column key.

Second, we create a combo dataframe, merging the df1 with a aggregated df2, where the aggregation creates two columns for value and name, in which all entries are joined into a ":"-separated string.

Finally, we use str.split with expand=True to split those strings into separate columns and merge it with df1.

  key   ref name value name0 name1 value0 value1
0  k1  None   N1     A    N2    N3      B      C
3  k2  None   N4     D        None          None
4  k3  None   N5     E    N6    N7      F      G

The None entries are not yet quite right, but that would be easy to fix.

Upvotes: 1

Related Questions