Reputation: 1354

split dataframe rows into multiple for fractional values

sample data:

sample = pd.DataFrame({'split_me': [1.5, 2, 4, 3.2], 'copy_me': ['A', 'B', 'C', 'D']})
out = pd.DataFrame({'split_me': [1, 0.5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.2], 'copy_me': ['A', 'A', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D']})

sample:  # input
    split_me    copy_me
0   1.5 A
1   2.0 B
2   4.0 C
3   3.2 D

out:  # desired output
    split_me    copy_me
0   1.0 A
1   0.5 A
2   1.0 B
3   1.0 B
4   1.0 C
5   1.0 C
6   1.0 C
7   1.0 C
8   1.0 D
9   1.0 D
10  1.0 D
11  0.2 D

I tried using something like sample.loc[sample.index.repeat(sample['split_me'])]. However this only repeats for integers and a value like 1.9 returns 1 row when I need it to return 2, and it leaves values in split_me unchanged, whereas I need to duplicate the rows, allocate 1 to split_me if it's greater than 1, otherwise allocate the value.

I can't think of a way to do this without getting loopy and complicated, my best approach is to ceil(split_me) and then run repeat, but I still need a way to allocate the value to duplicated rows. Looking for a simpler solution if anyone has it.

Upvotes: 0

Answers (4)

Dani Mesejo

Reputation: 61920

Try:

import pandas as pd
import numpy as np

sample = pd.DataFrame({'split_me': [1.5, 2, 4, 3.2], 'copy_me': ['A', 'B', 'C', 'D']})


def expanded_index(s, c):
    index = np.repeat(1.0, s // 1)
    if (s % 1) > 0:
        index = np.append(index, [s % 1])
    return pd.Series(c, index)


res = pd.concat([expanded_index(s, c) for s, c in zip(sample["split_me"], sample["copy_me"])])
print(res)

Output

1.0    A
0.5    A
1.0    B
1.0    B
1.0    C
1.0    C
1.0    C
1.0    C
1.0    D
1.0    D
1.0    D
0.2    D
dtype: object

Upvotes: 2

Henry Ecker

Reputation: 35676

We could use np.modf to separate the fractional and integral parts of split_me, then create a new Series of 1s based on the only the repeat of the integral portion. append the non-zero fractional parts, sort_index to get into the expected order, lastly join back the columns and reset_index to restore the range index:

fractional, integral = np.modf(sample['split_me'])
df = (
    pd.Series(1, index=integral.index.repeat(integral), name=integral.name)
        .append(fractional[fractional.ne(0)]).sort_index(kind='stable')
        .to_frame().join(sample[['copy_me']]).reset_index(drop=True)
)

df:

    split_me copy_me
0        1.0       A
1        0.5       A
2        1.0       B
3        1.0       B
4        1.0       C
5        1.0       C
6        1.0       C
7        1.0       C
8        1.0       D
9        1.0       D
10       1.0       D
11       0.2       D

Setup and imports:

import numpy as np
import pandas as pd

sample = pd.DataFrame({
    'split_me': [1.5, 2, 4, 3.2],
    'copy_me': ['A', 'B', 'C', 'D']
})

Upvotes: 2

Corralien

Reputation: 120479

Use a custom repeat function

repeat_float = lambda x: ([1.] * int(x // 1)) + ([x % 1] if x % 1 != 0 else [])
out = df['split_me'].apply(repeat_float).explode().astype(float) \
                    .to_frame().join(df['copy_me']).reset_index(drop=True)

Output:

>>> out
    split_me copy_me
0        1.0       A
1        0.5       A
2        1.0       B
3        1.0       B
4        1.0       C
5        1.0       C
6        1.0       C
7        1.0       C
8        1.0       D
9        1.0       D
10       1.0       D
11       0.2       D

Upvotes: 2

BENY

Reputation: 323326

Yes we can do this

out = sample.reindex(sample.index.repeat(np.ceil(sample['split_me'])))
out['new'] = 1
con = ~out['copy_me'].duplicated(keep='last') & (out['split_me']%1!=0)
out['new'] = out['new'].mask(con, out['split_me']%1)
out
Out[195]: 
   split_me copy_me  new
0       1.5       A  1.0
0       1.5       A  0.5
1       2.0       B  1.0
1       2.0       B  1.0
2       4.0       C  1.0
2       4.0       C  1.0
2       4.0       C  1.0
2       4.0       C  1.0
3       3.2       D  1.0
3       3.2       D  1.0
3       3.2       D  1.0
3       3.2       D  0.2

Upvotes: 3

split dataframe rows into multiple for fractional values

Answers (4)

Related Questions