Reputation: 1637
I am trying to append the values from a single row in one Pandas dataframe to another. The two dataframes have the same number of rows, so I did not expect this to cause any issues. However, while it throws no errors, the output is problematic.
It results in the last two rows of the appended columns being nan
values, and one of the values in the row being omitted in the process.
Here is the first dataframe `ds1':
+----+-----------+-------+-----------+------------+--------------------+
| | Unique ID | Zip | Revenue | Population | Revenue_Per_Person |
+----+-----------+-------+-----------+------------+--------------------+
| 1 | 179 | 75208 | 67789037 | 30171 | 2246.827649067 |
| 2 | 186 | 75208 | 62488032 | 30171 | 2071.1289649001 |
| 3 | 180 | 75212 | 107230739 | 24884 | 4309.2243610352 |
| 4 | 182 | 75212 | 81768596 | 24884 | 3285.9908374859 |
| 5 | 181 | 75137 | 93296769 | 18861 | 4946.5441386989 |
| 6 | 183 | 75237 | 79177044 | 17101 | 4629.9657329981 |
| 7 | 187 | 75237 | 60000000 | 17101 | 3508.5667504824 |
| 9 | 185 | 75236 | 76489996 | 15949 | 4795.9117186031 |
| 10 | 189 | 75236 | 55203335 | 15949 | 3461.2411436454 |
| 11 | 188 | 75115 | 57451134 | 48877 | 1175.422673241 |
+----+-----------+-------+-----------+------------+--------------------+
And the second, `ds2':
+---+-----------+-------+---------+
| | 0 | 1 | cluster |
+---+-----------+-------+---------+
| 0 | 67789037 | 30171 | 1 |
| 1 | 62488032 | 30171 | 1 |
| 2 | 107230739 | 24884 | 0 |
| 3 | 81768596 | 24884 | 0 |
| 4 | 93296769 | 18861 | 0 |
| 5 | 79177044 | 17101 | 0 |
| 6 | 60000000 | 17101 | 1 |
| 7 | 76489996 | 15949 | 0 |
| 8 | 55203335 | 15949 | 1 |
| 9 | 57451134 | 48877 | 2 |
+---+-----------+-------+---------+
Here is my original code:
ds1['Type'] = ds2['cluster']
When I check the values of ds1 after running the above line, I get the following values in the ds1
dataframe.
+----+-----------+-------+--------------------+------------+--------------------+------+
| | Unique ID | Zip | Revenue | Population | Revenue_Per_Person | Type |
+----+-----------+-------+--------------------+------------+--------------------+------+
| 1 | 179 | 75208 | 67789037.0 | 30171 | 2246.827649066985 | 1.0 |
| 2 | 186 | 75208 | 62488032.0 | 30171 | 2071.1289649000696 | 0.0 |
| 3 | 180 | 75212 | 107230738.99999999 | 24884 | 4309.2243610352025 | 0.0 |
| 4 | 182 | 75212 | 81768596.0 | 24884 | 3285.9908374859347 | 0.0 |
| 5 | 181 | 75137 | 93296769.0 | 18861 | 4946.544138698902 | 0.0 |
| 6 | 183 | 75237 | 79177044.0 | 17101 | 4629.96573299807 | 1.0 |
| 7 | 187 | 75237 | 60000000.0 | 17101 | 3508.566750482428 | 0.0 |
| 9 | 185 | 75236 | 76489995.99999999 | 15949 | 4795.911718603046 | 2.0 |
| 10 | 189 | 75236 | 55203334.99999999 | 15949 | 3461.241143645369 | nan |
| 11 | 188 | 75115 | 57451133.99999999 | 48877 | 1175.4226732409925 | nan |
+----+-----------+-------+--------------------+------------+--------------------+------+
It's interesting to note, that this code does throw the following warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
So I tried an alternative approach:
ds1['Type'] = ds2.loc[:,'cluster']
Which produces the same warning and the same dataframe outcome, with the single missing value and the two nan
values at the end.
Upvotes: 1
Views: 4441
Reputation: 294358
This is due to index
mis-alignment. Notice that ds1
has index values of 10
and 11
and you are assigning to a new column ds1
a series without those indices. That results in missing values for those two indices.
Assign the values
from the right side to the column on the left to bypass the alignment issue.
ds1['Type'] = ds2['cluster'].values
If the index is meaningless to you, you could reset_index
ahead of time
ds1.reset_index(drop=True, inplace=True)
ds2.reset_index(drop=True, inplace=True)
ds1['Type'] = ds2['cluster']
Upvotes: 4