HMLDude
HMLDude

Reputation: 1637

Copying a Single Row From One Pandas Dataframe to Another Results in Missing Values

I am trying to append the values from a single row in one Pandas dataframe to another. The two dataframes have the same number of rows, so I did not expect this to cause any issues. However, while it throws no errors, the output is problematic.

It results in the last two rows of the appended columns being nan values, and one of the values in the row being omitted in the process.

Here is the first dataframe `ds1':

+----+-----------+-------+-----------+------------+--------------------+
|    | Unique ID |  Zip  |  Revenue  | Population | Revenue_Per_Person |
+----+-----------+-------+-----------+------------+--------------------+
|  1 |       179 | 75208 |  67789037 |      30171 |     2246.827649067 |
|  2 |       186 | 75208 |  62488032 |      30171 |    2071.1289649001 |
|  3 |       180 | 75212 | 107230739 |      24884 |    4309.2243610352 |
|  4 |       182 | 75212 |  81768596 |      24884 |    3285.9908374859 |
|  5 |       181 | 75137 |  93296769 |      18861 |    4946.5441386989 |
|  6 |       183 | 75237 |  79177044 |      17101 |    4629.9657329981 |
|  7 |       187 | 75237 |  60000000 |      17101 |    3508.5667504824 |
|  9 |       185 | 75236 |  76489996 |      15949 |    4795.9117186031 |
| 10 |       189 | 75236 |  55203335 |      15949 |    3461.2411436454 |
| 11 |       188 | 75115 |  57451134 |      48877 |     1175.422673241 |
+----+-----------+-------+-----------+------------+--------------------+

And the second, `ds2':

+---+-----------+-------+---------+
|   |     0     |   1   | cluster |
+---+-----------+-------+---------+
| 0 |  67789037 | 30171 |       1 |
| 1 |  62488032 | 30171 |       1 |
| 2 | 107230739 | 24884 |       0 |
| 3 |  81768596 | 24884 |       0 |
| 4 |  93296769 | 18861 |       0 |
| 5 |  79177044 | 17101 |       0 |
| 6 |  60000000 | 17101 |       1 |
| 7 |  76489996 | 15949 |       0 |
| 8 |  55203335 | 15949 |       1 |
| 9 |  57451134 | 48877 |       2 |
+---+-----------+-------+---------+

Here is my original code:

ds1['Type'] = ds2['cluster']

When I check the values of ds1 after running the above line, I get the following values in the ds1 dataframe.

+----+-----------+-------+--------------------+------------+--------------------+------+
|    | Unique ID | Zip   | Revenue            | Population | Revenue_Per_Person | Type |
+----+-----------+-------+--------------------+------------+--------------------+------+
| 1  | 179       | 75208 | 67789037.0         | 30171      | 2246.827649066985  | 1.0  |
| 2  | 186       | 75208 | 62488032.0         | 30171      | 2071.1289649000696 | 0.0  |
| 3  | 180       | 75212 | 107230738.99999999 | 24884      | 4309.2243610352025 | 0.0  |
| 4  | 182       | 75212 | 81768596.0         | 24884      | 3285.9908374859347 | 0.0  |
| 5  | 181       | 75137 | 93296769.0         | 18861      | 4946.544138698902  | 0.0  |
| 6  | 183       | 75237 | 79177044.0         | 17101      | 4629.96573299807   | 1.0  |
| 7  | 187       | 75237 | 60000000.0         | 17101      | 3508.566750482428  | 0.0  |
| 9  | 185       | 75236 | 76489995.99999999  | 15949      | 4795.911718603046  | 2.0  |
| 10 | 189       | 75236 | 55203334.99999999  | 15949      | 3461.241143645369  | nan  |
| 11 | 188       | 75115 | 57451133.99999999  | 48877      | 1175.4226732409925 | nan  |
+----+-----------+-------+--------------------+------------+--------------------+------+

It's interesting to note, that this code does throw the following warning:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

So I tried an alternative approach:

ds1['Type'] = ds2.loc[:,'cluster']

Which produces the same warning and the same dataframe outcome, with the single missing value and the two nan values at the end.

Upvotes: 1

Views: 4441

Answers (1)

piRSquared
piRSquared

Reputation: 294358

This is due to index mis-alignment. Notice that ds1 has index values of 10 and 11 and you are assigning to a new column ds1 a series without those indices. That results in missing values for those two indices.

Assign the values from the right side to the column on the left to bypass the alignment issue.

ds1['Type'] = ds2['cluster'].values

If the index is meaningless to you, you could reset_index ahead of time

ds1.reset_index(drop=True, inplace=True)
ds2.reset_index(drop=True, inplace=True)

ds1['Type'] = ds2['cluster']

Upvotes: 4

Related Questions