Reputation: 586
Here is a sample data
import pandas as pd
cols = ['Country','Name','SomeNumber','SomeDate']
sourceData = [('WI','Vivian',34,'#1985-01-01#'),
('IND','Sam',56,'#1988-02-01#'),
('NZ','Richard',324,'#1987-07-01#'),
('AUS','Don',98,'#1998-07-12#'),
('SL','Simth',101,'#2001-07-12#'),]
x = pd.DataFrame(sourceData,columns=cols)
x
Country Name SomeNumber SomeDate
0 WI Vivian 34 #1985-01-01#
1 IND Sam 56 #1988-02-01#
2 NZ Richard 324 #1987-07-01#
3 AUS Don 98 #1998-07-12#
4 SL Simth 101 #2001-07-12#
What I want to do is, update every column, every value in the table with values missing, except 'Name' columns
Now, update the dataframe should look like this:
Country Name SomeNumber SomeDate
0 MISSING Vivian MISSING MISSING
1 MISSING Sam MISSING MISSING
2 MISSING Richard MISSING MISSING
3 MISSING Don MISSING MISSING
4 MISSING Simth MISSING MISSING
Be advised, I do not want to do something like this considering, in the real world, I have 114 columns:
x['Country'] = 'MISSING'
x['SomeNumber'] = 'MISSING'
x['SomeDate'] = 'MISSING'
I tried this:
cols.remove('Name')
x[cols] = 'MISSING"
But it gave me the following warning, which I want to avoid:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#inde
xing-view-versus-copy
inTardisMissingInSource[cols] = 'MISSING'
C:\tardis\desktop\environment\python\lib\site-packages\pandas\core\indexing.py:477: SettingWithCopyW
arning:
Upvotes: 1
Views: 311
Reputation: 403248
The SettingWithCopyWarning
is a good indication that you're using boolean indexing in the wrong place. You should instead use df.loc
, like this:
In [1430]: x.loc[:, x.columns.difference(['Name'])] = 'MISSING'
In [1431]: x
Out[1431]:
Country Name SomeNumber SomeDate
0 MISSING Vivian MISSING MISSING
1 MISSING Sam MISSING MISSING
2 MISSING Richard MISSING MISSING
3 MISSING Don MISSING MISSING
4 MISSING Simth MISSING MISSING
The main bit is the x.columns.difference([...])
. Pass in a list of column headers you want to exclude, and those columns will not be picked up for the assignment.
Note that this mixed assignment will change the dtype
of the rows, use with caution.
If you don't want inplace assignment, you can use df.assign
by unpacking a dictionary:
In [1435]: x.assign(**{ k : 'MISSING' for k in x.columns.difference(['Name'])})
Out[1435]:
Country Name SomeNumber SomeDate
0 MISSING Vivian MISSING MISSING
1 MISSING Sam MISSING MISSING
2 MISSING Richard MISSING MISSING
3 MISSING Don MISSING MISSING
4 MISSING Simth MISSING MISSING
Upvotes: 4