Replace a column in a data frame with a numpy array

Question

I have a pandas data frame with shape 1725 rows X 4 columns.

      date     size           state type   
408      1    32000        Virginia  EDU
...

I need to replace the state column with the following numpy array with shape (1725, 52).

[[0. 1. 0. ... 0. 0. 0.]
...
[0. 0. 1. ... 0. 0. 0.]]

The final result should be like this:

      date     size                   state type   
408      1    32000 [0. 1. 0. ... 0. 0. 0.]  EDU
...

So far I tried the following based on this answer:

col = 2
df.iloc[:, col] = np_arr.tolist()

The problem is that I get this error:

    dataSet.iloc[:, col] = tempData.tolist()
  File "/home/marcus/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 205, in __setitem__
    self._setitem_with_indexer(indexer, value)
  File "/home/marcus/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 527, in _setitem_with_indexer
    "Must have equal len keys and value "
ValueError: Must have equal len keys and value when setting with an ndarray

Celius Stingher · Accepted Answer

I believe you need to try reshaping your array to a turn it into a single feature before actually adding it to the column. This problem often arises when preprocessing. Try with the following:

df['state'] = np_arr.reshape(-1,1)

If that doesn't work, you can try first turning it into an array and then to a list:

df['state'] = np_arr.toarray().tolist()

Working with multiple columns: You can try doing these replacements in a for loop using either list(df) which returns a list of all the column names and then accessing them with their index value or with iloc[]:

cols = list(df) #Get a list with all column names
column_positions = [0,2,4,5] #Here we will select columns in position 0,2,4 and 5
for i in column_positions: 
    df[cols[i]] = np_arr.tolist() #Iterate over those specific columns and replace their values.

Replace a column in a data frame with a numpy array

Answers (1)

Related Questions