psychcphyscst
psychcphyscst

Reputation: 127

Pandas wide to long by two columns

I am using pandas in Python and want to transform a dataframe from wide to long while collapsing two columns. Specifically, I have a dataframe like this:

user   town    dog  cat
tom    london  1    1
dick   miami   0    1
harry  paris   0    0
tina   london  3    0
donna  dallas  0    1
hannah dallas  1    0

which I want to transform to a long table while collapsing dogs and cats into a category pet:

user   town    pets
tom    london  dog
tom    london  cat
dick   miami   cat
tina   london  dog
tina   london  dog
tina   london  dog
donna  dallas  cat
hannah dallas  dog

In R, it is possible to do this using the function pivot_long but I want to learn how to do it in Python using pandas. So far I have tried to use pandas.melt(df, id_vars=['user', 'town'], value_vars=['dog', 'cat']) but the output looks like:

user   town   variable  value
tom    london dog       1
tom    london cat       1
dick   miami  dog       0
dick   miami  cat       1
...

Upvotes: 1

Views: 997

Answers (2)

ALollz
ALollz

Reputation: 59579

set_index and stack the pet columns into a single Series. Renaming the columns axis brings that name along to the result. Then using Series.repeat you can get the required duplication as all the information is in the Index.

s = df.set_index(['user', 'town']).rename_axis(columns='pets').stack()

#df1 = s.repeat(s).reset_index().drop(columns=0)

# Credit @Scott Boston for this simpler version.
df1 = s.repeat(s).index.to_frame(index=False)

print(df1)
     user    town pets
0     tom  london  dog
1     tom  london  cat
2    dick   miami  cat
3    tina  london  dog
4    tina  london  dog
5    tina  london  dog
6   donna  dallas  cat
7  hannah  dallas  dog

Upvotes: 4

Camille Church
Camille Church

Reputation: 46

I had to do a little bit of pandas manipulation to get it in the format you want. But, this gives the output you are looking for:

    import pandas as pd

    data = [["tom", "london", 1, 1]
           ,["dick", "miami", 0, 1]
           ,["harry", "paris", 0, 0]
           ,["tina", "london", 3, 0]
           ,["donna", "dallas", 0, 1]
           ,["hannah", "dallas", 1, 0]]

    columns = ["user", "town", "dog", "cat"]

    df = pd.DataFrame(data, columns=columns)
    long = pd.melt(df, id_vars=['user', 'town'], value_vars=['dog', 'cat'])
    long = long[long["value"] > 0]
    long = long.loc[long.index.repeat(long["value"])][["user", "town", "variable"]]
    print(long)

Upvotes: 3

Related Questions