Reputation: 1061
I have a pandas dataframe with name of variables, the values for each and the count
(which shows the frequency of that row):
df = pd.DataFrame({'var':['A', 'B', 'C'], 'value':[10, 20, 30], 'count':[1,2,3]})
var value count
A 10 1
B 20 2
C 30 3
I want to use count
to get an output like this:
var value
A 10
B 20
B 20
C 30
C 30
C 30
What is the best way to do that?
Upvotes: 8
Views: 1094
Reputation: 71689
You can use index.repeat
:
i = df.index.repeat(df['count'])
d = df.loc[i, :'value'].reset_index(drop=True)
var value
0 A 10
1 B 20
2 B 20
3 C 30
4 C 30
5 C 30
Upvotes: 7
Reputation: 153480
Use repeat
with reindex
for this short one-liner:
df.reindex(df.index.repeat(df['count']))
Output:
var value count
0 A 10 1
1 B 20 2
1 B 20 2
2 C 30 3
2 C 30 3
2 C 30 3
Or to eliminate the 'count' column:
df[['var','value']].reindex(df.index.repeat(df['count']))
OR
df.reindex(df.index.repeat(df['count'])).drop('count', axis=1)
Output:
var value
0 A 10
1 B 20
1 B 20
2 C 30
2 C 30
2 C 30
Upvotes: 3
Reputation: 6758
Using Series.repeat
import pandas as pd
df = pd.DataFrame({'var':['A', 'B', 'C'], 'value':[10, 20, 30], 'count':[1,2,3]})
new_df = pd.DataFrame()
new_df['var'] = df['var'].repeat(df['count'])
new_df['value'] = df['value'].repeat(df['count'])
new_df
var value
0 A 10
1 B 20
1 B 20
2 C 30
2 C 30
2 C 30
Upvotes: 2
Reputation: 2233
There are many, many ways to achieve this. Here is one cheeky approach that I like doing:
df.transform({
"count": lambda x: [i for i in range(x)],
"var": lambda x: x,
"value": lambda x: x
}).explode("count").drop("count", axis=1)
Upvotes: 1