Reputation: 79
I used pandas to read my csv file from the cloud, I used replace() and wanted 0 to become a missing value, but it doesn't seem to work.
I use Google's colab
I tried two methods:
user_data = user_data.replace(0,np.nan) # first
user_data.replace(0,np.nan,inplace = True) # second
user_data.head() # I use this to view the data.
But the data is the same as when I first read it, 0 has no change
Here is the function I read the file, I use the block method
# Read function
def get_df2(file):
mydata2 = []
for chunk in pd.read_csv(file,chunksize=500000,header = None,sep='\t'):
mydata2.append(chunk)
user_data = pd.concat(mydata2,axis=0)
names2=['user_id','age','gender','area','status']
user_data.columns = names2
return user_data
# read
user_data_path = 'a_url'
user_data = get_df2(user_data_path)
user_data.head()
Note: my code doesn't report an error, it outputs the result, but that's not what I want
Upvotes: 1
Views: 702
Reputation: 408
Python can get irritating under such scenarios.
As pointed out earlier, it is probably because of 0 being a string and not an integer. which can be catered by
user_data.replace("0",np.nan,inplace = True)
But, I wanted to point out, in scenarios where you know what kind of data should be in a column in a pandas dataframe, you should explicitly set it to that type, that way, whenever there is such a scenario an error will be raised and you will know exactly where the problem is.
In your case, columns are:
names2=['user_id','age','gender','area','status']
Let's assume
You can tell pandas which column is supposed to be which datatype by
user_data = userdata.astype({"user": str, "age": integer, "gender": str, "area": str, "status": str})
There are many other ways to do that, as mentioned in the following answer. Choose whichever suits you or your needs.
Upvotes: 0
Reputation: 24135
Your 0
s are probably just strings, try using:
user_data = user_data.replace('0', np.nan)
Upvotes: 1