Reputation: 33
Trying to re-structure a data frame with a format like this:
key ref name value
0 k1 None N1 A
1 None k1 N2 B
2 None k1 N3 C
3 k2 None N4 D
4 k3 None N5 E
5 None k3 N6 F
6 None k3 N7 G
# In code
df = pd.DataFrame(columns=['key', 'ref', 'name', 'value'],
data=[
['k1',None,'N1','A'],
[None,'k1','N2','B'],
[None,'k1','N3','C'],
['k2',None,'N4','D'],
['k3',None,'N5','E'],
[None,'k3','N6','F'],
[None,'k3','N7','G']])
Into this:
key ref name value name2 value2 name3 value3
0 k1 k1 N1 A N2 B N3 C
1 k2 None N4 D None None None None
2 k3 k3 N5 E N6 F N7 G
But are struggeling to get it right. 'key' and 'ref' are not indexes above but feel free to elaborate of how to make use of them this way (source is an Excel-file of this format), if that is a part of the solution. The goal is to have names and values mapped accordingly to the example... (key's and ref's will then be discarded)
Tried with merge and stack but can't get it to work properly...
Note to following rules:
In other words:
Upvotes: 3
Views: 257
Reputation: 2300
Maybe this is what you need:
import pandas as pd
df = pd.DataFrame(
columns=["key", "ref", "name", "value"],
data=[
["k1", None, "N1", "A"],
[None, "k1", "N2", "B"],
[None, "k1", "N3", "C"],
["k2", None, "N4", "D"],
["k3", None, "N5", "E"],
[None, "k3", "N6", "F"],
[None, "k3", "N7", "G"],
],
)
print(df)
ind = df["key"].isna()
df1 = df.loc[~ind]
df2 = df.loc[ind]
combo = (
df1.merge(df2[["ref", "name", "value"]], left_on="key", right_on="ref", how="left")
.fillna("")
.groupby("key")
.agg(name=pd.NamedAgg("name_y", ":".join), value=pd.NamedAgg("value_y", ":".join))
)
for c in ["name", "value"]:
dx = combo[c].str.split(":", expand=True).add_prefix(c)
df1 = df1.merge(dx, left_on="key", right_index=True)
print(df1)
First, we split df into two separate dataframes, depending on whether there is an entry in Column key
.
Second, we create a combo dataframe, merging the df1
with a aggregated df2
, where the aggregation creates two columns for value
and name
, in which all entries are joined into a ":"-separated string.
Finally, we use str.split
with expand=True
to split those strings into separate columns and merge it with df1.
key ref name value name0 name1 value0 value1
0 k1 None N1 A N2 N3 B C
3 k2 None N4 D None None
4 k3 None N5 E N6 N7 F G
The None
entries are not yet quite right, but that would be easy to fix.
Upvotes: 1