Wonton
Wonton

Reputation: 339

Add A 1-D Numpy Array to DataFrame as a Row

Is there a function which allows you to efficiently append a NumPy array directly to a DataFrame?

Variables:

df = pd.DataFrame(columns=['col1', 'col2', 'col3'])

Out[1]: +------+------+------+
        | Col1 | Col2 | Col3 |
        +------+------+------+
        |      |      |      |
        +------+------+------+


arr = np.empty(3)

# array is populated with values. Random numbers are chosen in this example,
#    but in my program, the numbers are not arbitrary.
arr[0] = 756
arr[1] = 123
arr[2] = 452

Out[2]: array([756, 123, 452])

How do I directly append arr to the end of dfto get this?

+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
|  756 |  123 |  452 |
+------+------+------+

I've tried using df.append(arr) but it doesn't accept NumPy arrays. I could convert the NumPy array into a DataFrame then append it, but I think that would be very inefficient, especially over millions of iterations. Is there a more efficient way to do it?

Upvotes: 19

Views: 46319

Answers (4)

DrWhat
DrWhat

Reputation: 2490

AttributeError: 'DataFrame' object has no attribute 'append'

From this SEx answer:

As of pandas 2.0, append (previously deprecated) was removed.

You need to use concat instead (for most applications):

df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)

... it's also possible to use loc, although this only works if the new index is not already present in the DataFrame (typically, this will be the case if the index is a RangeIndex:

df.loc[len(df)] = new_row # only use with a RangeIndex!

See original answer by mozway: for more details.

Upvotes: 0

braulio
braulio

Reputation: 571

@BalrogOfMoira is that really faster than simply creating the dataframe to append?

df.append(pd.DataFrame(arr.reshape(1,-1), columns=list(df)), ignore_index=True)

Otherwise @Wonton you could simply concatenate arrays then write to a data frame, which could the be appended to the original data frame.

Upvotes: 14

Mehdi Shafiei
Mehdi Shafiei

Reputation: 79

This will work:

df.append(pd.DataFrame(arr).T)

Upvotes: 7

BalrogOfMoria
BalrogOfMoria

Reputation: 134

@rafaelc comment can work only if your Pandas DataFrame is indexed from 0 to len(df)-1, so it is not a general workaround and it can easily produce a silent bug in your code.

If you are sure that your Numpy array has the same columns of your Pandas DataFrame you could try using the append function with a dict comprehension as follows:

data_to_append = {}
for i in range(len(df.columns)):
    data_to_append[df.columns[i]] = arr[i]
df = df.append(data_to_append, ignore_index = True)

You need to reassign the DataFrame because append function does not support in-place modification.

I hope it helps.

Upvotes: 4

Related Questions