M-M
M-M

Reputation: 450

How to create new column in Pandas with condition to repeat by a value of another column?

I'm beginner in Python, I have a big DataFrame which looks like that:

import pandas as pd
df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
                    'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
                    'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
df[["Total", "Type", "Count"]]
df

Output:

   Total    Type    Count
0   10     Child    4
1   10       Boy    5
2   10      Girl    1
3   10     Senior   0
4   10      
5   10      
6   10      
7   10      
8   10      
9   10      

I want to have something like that:

    Total   Type    Count   New
0   10     Child       4    Child
1   10       Boy       5    Child
2   10      Girl       1    Child
3   10    Senior       0    Child
4   10                      Boy
5   10                      Boy
6   10                      Boy
7   10                      Boy
8   10                      Boy
9   10                      Girl

I don’t know how I can create a new column with a condition to repeat Type ntime as the number of Count.

Thanks!

Upvotes: 9

Views: 1059

Answers (5)

BENY
BENY

Reputation: 323226

Using repeat, replace the blank to 0 in Count

df['New']=df.Type.repeat(df.Count.replace('',0)).values
df
Out[657]: 
  Count  Total    Type    New
0     4     10   Child  Child
1     5     10     Boy  Child
2     1     10    Girl  Child
3     0     10  Senior  Child
4           10            Boy
5           10            Boy
6           10            Boy
7           10            Boy
8           10            Boy
9           10           Girl

Upvotes: 8

jpp
jpp

Reputation: 164623

This is one way using itertools.chain and itertools.repeat:

from itertools import chain, repeat

# calculate number of non-blank rows
n = (df['Type'] != '').sum()

# extract values for these rows
vals = df[['Type', 'Count']].iloc[:n].values

# iterate and repeat values
df['New'] = list(chain.from_iterable(repeat(*row) for row in vals))

print(df)

  Count  Total    Type    New
0     4     10   Child  Child
1     5     10     Boy  Child
2     1     10    Girl  Child
3     0     10  Senior  Child
4           10            Boy
5           10            Boy
6           10            Boy
7           10            Boy
8           10            Boy
9           10           Girl

Upvotes: 1

U13-Forward
U13-Forward

Reputation: 71560

Try the below code, i multiplied df['Type'] to df['Count'] then flat out the list then create a new column for the flat list:

import numpy as np
import pandas as pd
df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
                    'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
                    'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
dropped = [str((x+' ')*y).split() for x,y in list(zip(df['Type'].tolist(),df['Count'].tolist())) if type(x) and type(y) != str]
df['New'] = sum(dropped, [])
print(df)

Output:

     Count Total Type   New
0     4     10   Child  Child
1     5     10     Boy  Child
2     1     10    Girl  Child
3     0     10  Senior  Child
4           10            Boy
5           10            Boy
6           10            Boy
7           10            Boy
8           10            Boy
9           10           Girl

Upvotes: 1

Mohamed Thasin ah
Mohamed Thasin ah

Reputation: 11192

try this,

df['New']= sum((df[df['Type']!=''].apply(lambda x: x['Count']*[x['Type']],axis=1)).values,[])

Output:

  Count  Total    Type repeat
0     4     10   Child  Child
1     5     10     Boy  Child
2     1     10    Girl  Child
3     0     10  Senior  Child
4           10            Boy
5           10            Boy
6           10            Boy
7           10            Boy
8           10            Boy
9           10           Girl

Upvotes: 1

javidcf
javidcf

Reputation: 59691

Not sure if this is the fastest way but it is a simple one:

from itertools import chain
import pandas as pd

df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
                    'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
                    'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
df['New'] = list(chain.from_iterable([t] * c for t, c in zip(df.Type, df.Count) if c))
print(df)

Output:

  Count  Total    Type    New
0     4     10   Child  Child
1     5     10     Boy  Child
2     1     10    Girl  Child
3     0     10  Senior  Child
4           10            Boy
5           10            Boy
6           10            Boy
7           10            Boy
8           10            Boy
9           10           Girl

Upvotes: 2

Related Questions