Reputation: 450
I'm beginner in Python, I have a big DataFrame which looks like that:
import pandas as pd
df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
df[["Total", "Type", "Count"]]
df
Output:
Total Type Count
0 10 Child 4
1 10 Boy 5
2 10 Girl 1
3 10 Senior 0
4 10
5 10
6 10
7 10
8 10
9 10
I want to have something like that:
Total Type Count New
0 10 Child 4 Child
1 10 Boy 5 Child
2 10 Girl 1 Child
3 10 Senior 0 Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
I don’t know how I can create a new column with a condition to repeat Type
ntime as the number of Count
.
Thanks!
Upvotes: 9
Views: 1059
Reputation: 323226
Using repeat
, replace
the blank to 0 in Count
df['New']=df.Type.repeat(df.Count.replace('',0)).values
df
Out[657]:
Count Total Type New
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
Upvotes: 8
Reputation: 164623
This is one way using itertools.chain
and itertools.repeat
:
from itertools import chain, repeat
# calculate number of non-blank rows
n = (df['Type'] != '').sum()
# extract values for these rows
vals = df[['Type', 'Count']].iloc[:n].values
# iterate and repeat values
df['New'] = list(chain.from_iterable(repeat(*row) for row in vals))
print(df)
Count Total Type New
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
Upvotes: 1
Reputation: 71560
Try the below code, i multiplied df['Type']
to df['Count']
then flat out the list then create a new column for the flat list:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
dropped = [str((x+' ')*y).split() for x,y in list(zip(df['Type'].tolist(),df['Count'].tolist())) if type(x) and type(y) != str]
df['New'] = sum(dropped, [])
print(df)
Output:
Count Total Type New
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
Upvotes: 1
Reputation: 11192
try this,
df['New']= sum((df[df['Type']!=''].apply(lambda x: x['Count']*[x['Type']],axis=1)).values,[])
Output:
Count Total Type repeat
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
Upvotes: 1
Reputation: 59691
Not sure if this is the fastest way but it is a simple one:
from itertools import chain
import pandas as pd
df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
df['New'] = list(chain.from_iterable([t] * c for t, c in zip(df.Type, df.Count) if c))
print(df)
Output:
Count Total Type New
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
Upvotes: 2