Reputation: 915
I am trying to create a pandas df that looks like:
AAA BBB CCC
0 4 10 100
1 4 20 50
2 5 30 -30
3 5 40 -50
To implement, I am for now creating two dataframes
df1 = pd.DataFrame({'AAA' : [4] * 2 , 'BBB' : [10,20], 'CCC' : [100,50]})
df2 = pd.DataFrame({'AAA': [5]*2, 'BBB' : [30,40],'CCC' : [-30,-50]})
and then appending rows of df2 to df1 to create the desired df
I tried to do
df = pd.DataFrame({'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' :
[10,20,30,40],'CCC' : [100,50,-30,-50]}); df
But I get an error with the key message:
ValueError('arrays must all be same length') ValueError: arrays must all be the same length
I can of course do:
df = pd.DataFrame({'AAA' : [4,4,5,5], 'BBB' : [10,20,30,40],'CCC' :
[100,50,-30,-50]}); df
But is there not another elegant way to do this? This small example is easy to implement but if I want to scale up to many rows, the input becomes very long.
Upvotes: 5
Views: 12924
Reputation: 862441
I believe you need join lists by +
:
df = pd.DataFrame({'AAA' : [4]*2 + [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print (df)
AAA BBB CCC
0 4 10 100
1 4 20 50
2 5 30 -30
3 5 40 -50
Or use repeat
with concatenate
:
df = pd.DataFrame({'AAA' : np.concatenate([np.repeat(4, 2), np.repeat(5, 2)]),
'BBB' : [10,20,30,40],
'CCC' : [100,50,-30,-50]})
Alternative:
df = pd.DataFrame({'AAA' : np.repeat((4,5), 2),
'BBB' : [10,20,30,40],
'CCC' : [100,50,-30,-50]})
print (df)
AAA BBB CCC
0 4 10 100
1 4 20 50
2 5 30 -30
3 5 40 -50
Upvotes: 7
Reputation: 5270
The error you get is quite clear. When you create a dataframe from a dictionary, all of the arrays must be the same length. When you create a dictionary, if you give the same key multiple time, the last one is used. So
{'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}
is the same as
{'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}
When you try to create a dataframe from that dictionnary, you want one column with 2 rows and 2 columns with 4 rows, hence the error. As @jezrael pointed out, you can create the desired column for 'AAA' by joining the list and then creating the dataframe from that list.
Upvotes: 1
Reputation: 61910
For a general solution you could do:
import pandas as pd
data = [(4, 2), (5, 2)]
df = pd.DataFrame({'AAA' : [value for value, reps in data for _ in range(reps)], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print(df)
Where data is a list of value, repetitions tuple. So for your particular example you have 4 with 2 repetitions and 5 with 2 repetitions hence [(4, 2), (5, 2)]
.
Upvotes: 1