Reputation: 127
I am using pandas in Python and want to transform a dataframe from wide to long while collapsing two columns. Specifically, I have a dataframe like this:
user town dog cat
tom london 1 1
dick miami 0 1
harry paris 0 0
tina london 3 0
donna dallas 0 1
hannah dallas 1 0
which I want to transform to a long table while collapsing dogs
and cats
into a category pet
:
user town pets
tom london dog
tom london cat
dick miami cat
tina london dog
tina london dog
tina london dog
donna dallas cat
hannah dallas dog
In R, it is possible to do this using the function pivot_long
but I want to learn how to do it in Python using pandas. So far I have tried to use pandas.melt(df, id_vars=['user', 'town'], value_vars=['dog', 'cat'])
but the output looks like:
user town variable value
tom london dog 1
tom london cat 1
dick miami dog 0
dick miami cat 1
...
Upvotes: 1
Views: 997
Reputation: 59579
set_index
and stack
the pet columns into a single Series. Renaming the columns axis brings that name along to the result. Then using Series.repeat
you can get the required duplication as all the information is in the Index.
s = df.set_index(['user', 'town']).rename_axis(columns='pets').stack()
#df1 = s.repeat(s).reset_index().drop(columns=0)
# Credit @Scott Boston for this simpler version.
df1 = s.repeat(s).index.to_frame(index=False)
print(df1)
user town pets
0 tom london dog
1 tom london cat
2 dick miami cat
3 tina london dog
4 tina london dog
5 tina london dog
6 donna dallas cat
7 hannah dallas dog
Upvotes: 4
Reputation: 46
I had to do a little bit of pandas manipulation to get it in the format you want. But, this gives the output you are looking for:
import pandas as pd
data = [["tom", "london", 1, 1]
,["dick", "miami", 0, 1]
,["harry", "paris", 0, 0]
,["tina", "london", 3, 0]
,["donna", "dallas", 0, 1]
,["hannah", "dallas", 1, 0]]
columns = ["user", "town", "dog", "cat"]
df = pd.DataFrame(data, columns=columns)
long = pd.melt(df, id_vars=['user', 'town'], value_vars=['dog', 'cat'])
long = long[long["value"] > 0]
long = long.loc[long.index.repeat(long["value"])][["user", "town", "variable"]]
print(long)
Upvotes: 3