Reputation: 1354
sample data:
sample = pd.DataFrame({'split_me': [1.5, 2, 4, 3.2], 'copy_me': ['A', 'B', 'C', 'D']})
out = pd.DataFrame({'split_me': [1, 0.5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.2], 'copy_me': ['A', 'A', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D']})
sample: # input
split_me copy_me
0 1.5 A
1 2.0 B
2 4.0 C
3 3.2 D
out: # desired output
split_me copy_me
0 1.0 A
1 0.5 A
2 1.0 B
3 1.0 B
4 1.0 C
5 1.0 C
6 1.0 C
7 1.0 C
8 1.0 D
9 1.0 D
10 1.0 D
11 0.2 D
I tried using something like sample.loc[sample.index.repeat(sample['split_me'])]
. However this only repeats for integers and a value like 1.9 returns 1 row when I need it to return 2, and it leaves values in split_me
unchanged, whereas I need to duplicate the rows, allocate 1 to split_me
if it's greater than 1, otherwise allocate the value.
I can't think of a way to do this without getting loopy and complicated, my best approach is to ceil(split_me)
and then run repeat
, but I still need a way to allocate the value to duplicated rows. Looking for a simpler solution if anyone has it.
Upvotes: 0
Views: 322
Reputation: 61920
Try:
import pandas as pd
import numpy as np
sample = pd.DataFrame({'split_me': [1.5, 2, 4, 3.2], 'copy_me': ['A', 'B', 'C', 'D']})
def expanded_index(s, c):
index = np.repeat(1.0, s // 1)
if (s % 1) > 0:
index = np.append(index, [s % 1])
return pd.Series(c, index)
res = pd.concat([expanded_index(s, c) for s, c in zip(sample["split_me"], sample["copy_me"])])
print(res)
Output
1.0 A
0.5 A
1.0 B
1.0 B
1.0 C
1.0 C
1.0 C
1.0 C
1.0 D
1.0 D
1.0 D
0.2 D
dtype: object
Upvotes: 2
Reputation: 35676
We could use np.modf
to separate the fractional and integral parts of split_me
, then create a new Series of 1s based on the only the repeat
of the integral portion. append
the non-zero fractional parts, sort_index
to get into the expected order, lastly join
back the columns and reset_index
to restore the range index:
fractional, integral = np.modf(sample['split_me'])
df = (
pd.Series(1, index=integral.index.repeat(integral), name=integral.name)
.append(fractional[fractional.ne(0)]).sort_index(kind='stable')
.to_frame().join(sample[['copy_me']]).reset_index(drop=True)
)
df
:
split_me copy_me
0 1.0 A
1 0.5 A
2 1.0 B
3 1.0 B
4 1.0 C
5 1.0 C
6 1.0 C
7 1.0 C
8 1.0 D
9 1.0 D
10 1.0 D
11 0.2 D
Setup and imports:
import numpy as np
import pandas as pd
sample = pd.DataFrame({
'split_me': [1.5, 2, 4, 3.2],
'copy_me': ['A', 'B', 'C', 'D']
})
Upvotes: 2
Reputation: 120479
Use a custom repeat function
repeat_float = lambda x: ([1.] * int(x // 1)) + ([x % 1] if x % 1 != 0 else [])
out = df['split_me'].apply(repeat_float).explode().astype(float) \
.to_frame().join(df['copy_me']).reset_index(drop=True)
Output:
>>> out
split_me copy_me
0 1.0 A
1 0.5 A
2 1.0 B
3 1.0 B
4 1.0 C
5 1.0 C
6 1.0 C
7 1.0 C
8 1.0 D
9 1.0 D
10 1.0 D
11 0.2 D
Upvotes: 2
Reputation: 323326
Yes we can do this
out = sample.reindex(sample.index.repeat(np.ceil(sample['split_me'])))
out['new'] = 1
con = ~out['copy_me'].duplicated(keep='last') & (out['split_me']%1!=0)
out['new'] = out['new'].mask(con, out['split_me']%1)
out
Out[195]:
split_me copy_me new
0 1.5 A 1.0
0 1.5 A 0.5
1 2.0 B 1.0
1 2.0 B 1.0
2 4.0 C 1.0
2 4.0 C 1.0
2 4.0 C 1.0
2 4.0 C 1.0
3 3.2 D 1.0
3 3.2 D 1.0
3 3.2 D 1.0
3 3.2 D 0.2
Upvotes: 3