nomnomyang
nomnomyang

Reputation: 65

how to add new column with string and distribute it perfectly along number of rows in python

I have this kind of data frame:

no    id       size  type a complete   type a incomplete   type b complete   type b incomplete   type c complete   type c incomplete

1    shoes     270      1                    2                 0                  0                  1                  1
2    shoes     275      2                    2                 1                  0                  0                  1

and I want to simplify the data frame like below:

no    id      size    type       status           value
1    shoes    270      a         complete          1
1    shoes    270      a         incomplete        2
1    shoes    270      b         complete          0
1    shoes    270      b         incomplete        0
1    shoes    270      c         complete          1
1    shoes    270      c         incomplete        1
2    shoes    275      a         complete          2
2    shoes    275      a         incomplete        2
2    shoes    275      b         complete          1
2    shoes    275      b         incomplete        0
2    shoes    275      c         complete          0
2    shoes    275      c         incomplete        1

what I have tried so far was doing df.melt first to get the "type" column like this way:

df = df.melt(id_vars = ['no', 'id', 'size'], var_name = 'type', value_name = 'value').sort_values('no')

then make "status" column and distribute it manually:

status = ['complete', 'incomplete']

df['status'] = (df.index % len(status)).map(dict(enumerate(status)))

but it did not give me the perfect desired table as I expect.

Is there any other way I can do to get the expected table?

Thank you!

Upvotes: 1

Views: 68

Answers (1)

jezrael
jezrael

Reputation: 863166

You can convert tot splitted columns to MultiIndex by DataFrame.set_index, then split columns names by space with remove first level (type string) by MultiIndex.droplevel for 2 levels MultiIndex in columns, add columns names by DataFrame.rename_axis and reshape by DataFrame.stack with convert Series to DataFrame by Series.reset_index:

df = df.set_index(['no','id','size'])
df.columns = df.columns.str.split(expand=True).droplevel(0)
df = (df.rename_axis(['type','status'], axis=1)
        .stack([0,1], dropna=False)
        .reset_index(name='value'))
print (df)
    no     id  size type      status  value
0    1  shoes   270    a    complete      1
1    1  shoes   270    a  incomplete      2
2    1  shoes   270    b    complete      0
3    1  shoes   270    b  incomplete      0
4    1  shoes   270    c    complete      1
5    1  shoes   270    c  incomplete      1
6    2  shoes   275    a    complete      2
7    2  shoes   275    a  incomplete      2
8    2  shoes   275    b    complete      1
9    2  shoes   275    b  incomplete      0
10   2  shoes   275    c    complete      0
11   2  shoes   275    c  incomplete      1

Upvotes: 2

Related Questions