Reputation: 339
Is there a function which allows you to efficiently append a NumPy array directly to a DataFrame?
Variables:
df = pd.DataFrame(columns=['col1', 'col2', 'col3'])
Out[1]: +------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| | | |
+------+------+------+
arr = np.empty(3)
# array is populated with values. Random numbers are chosen in this example,
# but in my program, the numbers are not arbitrary.
arr[0] = 756
arr[1] = 123
arr[2] = 452
Out[2]: array([756, 123, 452])
How do I directly append arr
to the end of df
to get this?
+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| 756 | 123 | 452 |
+------+------+------+
I've tried using df.append(arr)
but it doesn't accept NumPy arrays. I could convert the NumPy array into a DataFrame then append it, but I think that would be very inefficient, especially over millions of iterations. Is there a more efficient way to do it?
Upvotes: 19
Views: 46319
Reputation: 2490
AttributeError: 'DataFrame' object has no attribute 'append'
From this SEx answer:
As of pandas 2.0, append (previously deprecated) was removed.
You need to use concat instead (for most applications):
df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
... it's also possible to use loc, although this only works if the new index is not already present in the DataFrame (typically, this will be the case if the index is a RangeIndex:
df.loc[len(df)] = new_row # only use with a RangeIndex!
See original answer by mozway: for more details.
Upvotes: 0
Reputation: 571
@BalrogOfMoira is that really faster than simply creating the dataframe to append?
df.append(pd.DataFrame(arr.reshape(1,-1), columns=list(df)), ignore_index=True)
Otherwise @Wonton you could simply concatenate arrays then write to a data frame, which could the be appended to the original data frame.
Upvotes: 14
Reputation: 134
@rafaelc comment can work only if your Pandas DataFrame is indexed from 0 to len(df)-1, so it is not a general workaround and it can easily produce a silent bug in your code.
If you are sure that your Numpy array has the same columns of your Pandas DataFrame you could try using the append function with a dict comprehension as follows:
data_to_append = {}
for i in range(len(df.columns)):
data_to_append[df.columns[i]] = arr[i]
df = df.append(data_to_append, ignore_index = True)
You need to reassign the DataFrame because append
function does not support in-place modification.
I hope it helps.
Upvotes: 4