Reputation: 1
I'm trying to duplicate rows of a pandas DataFrame (v.0.23.4, python v.3.7.1) based on an int value in one of the columns. I'm applying code from this question to do that, but I'm running into the following data type casting error: TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'
. Basically, I'm not understanding why this code is attempting to cast to int32
.
Starting with this,
dummy_dict = {'c1': ['a','b','c'],
'c2': [0,1,2]}
dummy_df = pd.DataFrame(dummy_dict)
c1 c2 c3
0 a 0 textA
1 b 1 textB
2 c 2 textC
I'm doing this
dummy_df_test = dummy_df.reindex(dummy_df.index.repeat(dummy_df['c2']))
I want this at the end. However, I'm getting the above error instead.
c1 c2 c3
0 a 0 textA
1 b 1 textB
2 c 2 textC
3 c 2 textC
Upvotes: 0
Views: 186
Reputation: 75150
Just a workaround:
pd.concat([dummy_df[dummy_df.c2.eq(0)],dummy_df.loc[dummy_df.index.repeat(dummy_df.c2)]])
Another fantastic suggestion courtesy @Wen
dummy_df.reindex(dummy_df.index.repeat(dummy_df['c2'].clip(lower=1)))
c1 c2
0 a 0
1 b 1
2 c 2
2 c 2
Upvotes: 2
Reputation: 46479
In the first attempt all rows are duplicated, and in the second attempt just the row with the index 2. Thanks to the concat
function.
df2 = pd.concat([df]*2, ignore_index=True)
print(df2)
df3= pd.concat([df, df.iloc[[2]]])
print(df3)
c1 c2 c3
0 a 0 textA
1 b 1 textB
2 c 2 textC
c1 c2 c3
0 a 0 textA
1 b 1 textB
2 c 2 textC
3 a 0 textA
4 b 1 textB
5 c 2 textC
c1 c2 c3
0 a 0 textA
1 b 1 textB
2 c 2 textC
2 c 2 textC
If you plan to reset the index at the end
df3=df3.reset_index(drop=True)
Upvotes: 0
Reputation: 16172
I believe the answer as to why it's happening can be found here: https://github.com/numpy/numpy/issues/4384
Specifying the dtype as int32 should solve the problem as highlighted in the original comment.
Upvotes: 0