Oo'-
Oo'-

Reputation: 236

The pandas value error still shows, but the code is totally correct and it loads normally the visualization

I really wanted to use pd.options.mode.chained_assignment = None, but I wanted a code clean of error.

My start code:

import datetime
import altair as alt
import operator
import pandas as pd
s = pd.read_csv('../../data/aparecida-small-sample.csv', parse_dates=['date'])

city = s[s['city'] == 'Aparecida']

Based on @dpkandy's code:

city['total_cases'] = city['totalCases']
city['total_deaths'] = city['totalDeaths']
city['total_recovered'] = city['totalRecovered']

tempTotalCases = city[['date','total_cases']]
tempTotalCases["title"] = "Confirmed"

tempTotalDeaths = city[['date','total_deaths']]
tempTotalDeaths["title"] = "Deaths"

tempTotalRecovered = city[['date','total_recovered']]
tempTotalRecovered["title"] = "Recovered"

temp = tempTotalCases.append(tempTotalDeaths)
temp = temp.append(tempTotalRecovered)

totalCases = alt.Chart(temp).mark_bar().encode(alt.X('date:T', title = None), alt.Y('total_cases:Q', title = None))
totalDeaths = alt.Chart(temp).mark_bar().encode(alt.X('date:T', title = None), alt.Y('total_deaths:Q', title = None))
totalRecovered = alt.Chart(temp).mark_bar().encode(alt.X('date:T', title = None), alt.Y('total_recovered:Q', title = None))

(totalCases + totalRecovered + totalDeaths).encode(color=alt.Color('title', scale = alt.Scale(range = ['#106466','#DC143C','#87C232']), legend = alt.Legend(title="Legend colour"))).properties(title = "Cumulative number of confirmed cases, deaths and recovered", width = 800)

This code works perfectly and loaded normally the visualization image, but it still shows the pandas error, asking to try to set .loc[row_indexer,col_indexer] = value instead, then I was reading the documentation "Returning a view versus a copy" whose linked cited and also tried this code, but it still shows the same error. Here is the code with loc:

# 1st attempt
tempTotalCases.loc["title"] = "Confirmed"
tempTotalDeaths.loc["title"] = "Deaths"
tempTotalRecovered.loc["title"] = "Recovered"

# 2nd attempt
tempTotalCases["title"].loc = "Confirmed"
tempTotalDeaths["title"].loc = "Deaths"
tempTotalRecovered["title"].loc = "Recovered"

Here is the error message:

<ipython-input-6-f16b79f95b84>:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tempTotalCases["title"] = "Confirmed"
<ipython-input-6-f16b79f95b84>:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tempTotalDeaths["title"] = "Deaths"
<ipython-input-6-f16b79f95b84>:12: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tempTotalRecovered["title"] = "Recovered"

Jupyter and Pandas version:

$ jupyter --version
jupyter core     : 4.7.1
jupyter-notebook : 6.3.0
qtconsole        : 5.0.3
ipython          : 7.22.0
ipykernel        : 5.5.3
jupyter client   : 6.1.12
jupyter lab      : 3.1.0a3
nbconvert        : 6.0.7
ipywidgets       : 7.6.3
nbformat         : 5.1.3
traitlets        : 5.0.5

$ pip show pandas
Name: pandas
Version: 1.2.4
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: None
Author-email: None
License: BSD
Location: /home/gus/PUC/.env/lib/python3.9/site-packages
Requires: pytz, python-dateutil, numpy
Required-by: ipychart, altair

Update 2

I followed the answer, it worked, but there is another problem:

temp = tempTotalCases.append(tempTotalDeaths)
temp = temp.append(tempTotalRecovered)

Error log:

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value, self.name)
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
<ipython-input-7-b2649a676837> in <module>
     17 tempTotalRecovered.loc["title"] = _("Recovered")
     18 
---> 19 temp = tempTotalCases.append(tempTotalDeaths)
     20 temp = temp.append(tempTotalRecovered)
     21 
~/GitLab/Gustavo/global/.env/lib/python3.9/site-packages/pandas/core/frame.py in append(self, other, ignore_index, verify_integrity, sort)
   7980             to_concat = [self, other]
   7981         return (
-> 7982             concat(
   7983                 to_concat,
   7984                 ignore_index=ignore_index,
~/GitLab/Gustavo/global/.env/lib/python3.9/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    296     )
    297 
--> 298     return op.get_result()
    299 
    300 
~/GitLab/Gustavo/global/.env/lib/python3.9/site-packages/pandas/core/reshape/concat.py in get_result(self)
    514                     obj_labels = obj.axes[1 - ax]
    515                     if not new_labels.equals(obj_labels):
--> 516                         indexers[ax] = obj_labels.get_indexer(new_labels)
    517 
    518                 mgrs_indexers.append((obj._mgr, indexers))
~/GitLab/Gustavo/global/.env/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   3169 
   3170         if not self.is_unique:
-> 3171             raise InvalidIndexError(
   3172                 "Reindexing only valid with uniquely valued Index objects"
   3173             )
InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Upvotes: 0

Views: 291

Answers (1)

Jason
Jason

Reputation: 4546

This SettingWithCopyWarning is a warning and not an error. The importance in this distinction is that pandas isn't sure whether your code will produce the intended output so is letting the programmer make this decision where as a error means that something is definitely wrong.

The SettingWithCopyWarning is warning you about the difference between when you do something like df['First selection']['Second selection'] compared to df.loc[:, ('First selection', 'Second selection').

In the first case 2 separate events occur df['First selection'] takes place, then the object returned from this is used for the next seleciton returned_df['Second selection']. pandas has no way to know whether the returned_df is the original df or just temporary 'view' of this object. Most of the time is doesn't matter (see docs for more info)...but if you want to change a value on a temporary view of an object you'll be confused as to why your code runs error free but you don't see changes you made reflected. Using .loc bundles 'First selection' and 'Second selection' into one call so pandas can guarantee that what's returned is not just a view.

The documentation you linked show's you why your attempts to use .loc didn't work at you intended (eg. taken from docs):

def do_something(df):
    foo = df[['bar', 'baz']]  # Is foo a view? A copy? Nobody knows!
    # ... many lines here ...
    # We don't know whether this will modify df or not!
    foo['quux'] = value
    return foo

You have something similar in your code. Look at how tempTotalCases is created:

city = s[s['city'] == 'Aparecida']
# some lines of code    
tempTotalCases = city[['date','total_cases']]

And then some more lines of code before you attempt to do:

tempTotalCases.loc["title"] = "Confirmed"

So pandas throws the warning.

Separate from your original question you might find df.rename() useful. Link to docs.

You'll be able to do something like:

city = city.rename(columns={'totalCases': 'total_cases',
                            'totalDeaths': 'total_deaths',
                            'totalRecovered': 'total_recovered})

Upvotes: 1

Related Questions