Kay
Kay

Reputation: 915

create pandas dataframe with repeating values

I am trying to create a pandas df that looks like:

   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

To implement, I am for now creating two dataframes

df1 = pd.DataFrame({'AAA' : [4] * 2 , 'BBB' : [10,20], 'CCC' : [100,50]})
df2 = pd.DataFrame({'AAA': [5]*2, 'BBB' : [30,40],'CCC' : [-30,-50]})

and then appending rows of df2 to df1 to create the desired df

I tried to do

df = pd.DataFrame({'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' :
 [10,20,30,40],'CCC' : [100,50,-30,-50]}); df

But I get an error with the key message:

ValueError('arrays must all be same length') ValueError: arrays must all be the same length

I can of course do:

df = pd.DataFrame({'AAA' : [4,4,5,5], 'BBB' : [10,20,30,40],'CCC' :
 [100,50,-30,-50]}); df

But is there not another elegant way to do this? This small example is easy to implement but if I want to scale up to many rows, the input becomes very long.

Upvotes: 5

Views: 12924

Answers (3)

jezrael
jezrael

Reputation: 862441

I believe you need join lists by +:

df = pd.DataFrame({'AAA' : [4]*2 + [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print (df)
   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

Or use repeat with concatenate:

df = pd.DataFrame({'AAA' :  np.concatenate([np.repeat(4, 2), np.repeat(5, 2)]),
                   'BBB' : [10,20,30,40],
                   'CCC' : [100,50,-30,-50]})

Alternative:

df = pd.DataFrame({'AAA' :  np.repeat((4,5), 2),
                   'BBB' : [10,20,30,40],
                   'CCC' : [100,50,-30,-50]})

print (df)
   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

Upvotes: 7

Statistic Dean
Statistic Dean

Reputation: 5270

The error you get is quite clear. When you create a dataframe from a dictionary, all of the arrays must be the same length. When you create a dictionary, if you give the same key multiple time, the last one is used. So

{'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}

is the same as

{'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}

When you try to create a dataframe from that dictionnary, you want one column with 2 rows and 2 columns with 4 rows, hence the error. As @jezrael pointed out, you can create the desired column for 'AAA' by joining the list and then creating the dataframe from that list.

Upvotes: 1

Dani Mesejo
Dani Mesejo

Reputation: 61910

For a general solution you could do:

import pandas as pd

data = [(4, 2), (5, 2)]
df = pd.DataFrame({'AAA' : [value for value, reps in data for _ in range(reps)], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print(df)

Where data is a list of value, repetitions tuple. So for your particular example you have 4 with 2 repetitions and 5 with 2 repetitions hence [(4, 2), (5, 2)].

Upvotes: 1

Related Questions