j_90
j_90

Reputation: 99

Transform dataframe based on columns and rows

I have the following dataframe

CustomerNr Target Source Percentage
1001 A C 0.2
1004 D np.nan 0.3
1005 C D 0.4
1010 A D 0.5
import numpy as np
df = pd.DataFrame([[1001, 'A','C',0.2], [1004, 'D',np.nan,0.3],[1005, 'C','D',0.4], 
                   [1010, 'A','D',0.5]], columns=['CustomerNr','Target','Source','Percentage'])

to this one (any ideas how to formulate the title for this problem by the way)

import numpy as np
df = pd.DataFrame([['1001 Target' , 'A',0.2],
                   ['1001 Source' , 'C',0.2], 
                   ['1004 Target', 'D',0.3],
                   ['1004 Source', np.nan,0.3],
                   ['1005 Target', 'C',0.4],
                   ['1005 Source', 'D',0.4],
                   ['10010 Target', 'A',0.5],
                   ['10010 Source', 'D',0.5],
                  ], columns=['CustomerNr Scope','Value','Percentage'])
CustomerNr Scope Value Percentage
1001 Target A 0.2
1001 Source C 0.2
1004 Target D 0.3
1004 Source NaN 0.3
1005 Target C 0.4
1005 Source D 0.4
10010 Target A 0.5
10010 Source D 0.5

Upvotes: 0

Views: 64

Answers (3)

Farid
Farid

Reputation: 92

The solution will be like this.

import pandas as pd
df = pd.read_csv("test.csv", encoding='utf-8')
df
CustomerNr  Target  Source  Percentage
0   1001    A         C         0.2
1   1004    D         np.nan    0.3
2   1005    C         D         0.4
3   1010    A         D         0.5

Solution

new_df = pd.DataFrame()
new_indx = len(df)
for ind, row in df.iterrows():
    
    print(new_indx,"  ", ind)
    new_df.at[ind, "CustomerNrScoope"] = str(row['CustomerNr'])+'Target'
    new_df.at[ind,'Value'] = row['Target']
    new_df.at[ind, 'Percentage'] = row['Percentage']
    
    new_df.at[new_indx,"CustomerNrScoope"] = str(row['CustomerNr'])+'Source'
    new_df.at[new_indx,'Value'] = row['Source']
    new_df.at[new_indx,'Percentage'] = row['Percentage']
    new_indx = new_indx +1

new_df

Output

CustomerNrScoope    Value   Percentage
0   1001Target        A        0.2
4   1001Source        C        0.2
1   1004Target        D        0.3
5   1004Source        np.nan    0.3
2   1005Target        C        0.4
6   1005Source        D        0.4
3   1010Target        A        0.5
7   1010Source        D        0.5

Upvotes: 1

mcsoini
mcsoini

Reputation: 6642

You could use pandas stack to achieve this:

(df.set_index(["CustomerNr", "Percentage"])
         .rename_axis("Scope", axis=1)
         .stack(dropna=False)
         .rename("Value")
         .reset_index()
         .assign(CustomerNrScope=lambda df: df[["CustomerNr", "Scope"]].astype(str).apply(" ".join, axis=1)))

Or concatenate the tables for source and target:

df_new = pd.concat([df[["CustomerNr", tscol, "Percentage"]]
               .rename(columns={tscol: "Value"})
               .assign(Scope=tscol)
           for tscol in ["Target", "Source"]])
df_new["CustomerNr Scope"] = df_new.CustomerNr.astype(str) + " " + df_new.Scope


# result
   CustomerNr Value  Percentage   Scope CustomerNr Scope
0        1001     A         0.2  Target      1001 Target
1        1004     D         0.3  Target      1004 Target
2        1005     C         0.4  Target      1005 Target
3        1010     A         0.5  Target      1010 Target
0        1001     C         0.2  Source      1001 Source
1        1004   NaN         0.3  Source      1004 Source
2        1005     D         0.4  Source      1005 Source
3        1010     D         0.5  Source      1010 Source


Or (building on Zephyrus' answer) use melt with Percentage as additional id_var to get the the desired table right away (assuming that Percentage uniquely depends on CustomerNr):

pd.melt(df, id_vars=['CustomerNr', "Percentage"],
        value_vars=['Target', 'Source'],
        var_name='Scope')

Upvotes: 1

Zephyrus
Zephyrus

Reputation: 364

You can use pandas melt to unpivot your dataframe:

df_melted =  pd.melt(df, id_vars=['CustomerNr'], value_vars=['Target', 'Source'], var_name='Scope')

This doesn't include the 'Percentage' column, but you can merge that back into the new dataframe:

df_melted = df_melted.merge(df[[ 'CustomerNr', 'Percentage']], left_on='CustomerNr', right_on='CustomerNr' )

If you want your `'CustomerNr'` column and `'Scope`' column together you can easily add them together to one column. 

Upvotes: 1

Related Questions