Reputation: 59
I am reading a CSV file into variable called 'data' as follows in Jupyter Notebook using pandas
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv("C:/Users/hp/Desktop/dv project/googleplaystorecleaned.csv")
I tried to modify the 'Size' column of the data set to remove the character 'M' and 'k' using the following code
for i in range(len(data['Size'])):
data['Size'][i]=str(data['Size'][i])
data['Size'][i]=data['Size'][i].replace('M','')
data['Size'][i]=data['Size'][i].replace('k','')
data['Size'][i]=data['Size'][i].replace('Varies with device','')
data['Size'][i]=float(data['Size'][i])
print(data['Size'])
The code seems to work only partially on the data set as i am getting the following output
0 19
1 14
2 8.7
3 25
4 2.8
...
10836 53M
10837 3.6M
10838 9.5M
10839 Varies with device
10840 19M
Name: Size, Length: 10829, dtype: object
Please tell a proper way to do so.
Upvotes: 1
Views: 3187
Reputation: 421
Hi you can also try this:
import pandas as pd
list1= ['20M','9M','10K','10']
dataframe1=pd.DataFrame(data=list1,columns=['Size'])
for i, s in enumerate(dataframe1['Size']):
if s[len(s)-1]=='M':
dataframe1['Col1'][i]=dataframe1['Size'][i].replace('M',"")
if s[len(s)-1]=='K':
dataframe1['Col1'][i]=dataframe1['Size'][i].replace('K',"")
dataframe1
You will get your expected output.
Note: You can add if condition according to your requirements
Upvotes: 1
Reputation: 2123
I created an example dataframe to show the result :
df = pd.DataFrame({'A': [1,2,1], 'B': [3,4,3], 'Size': ['Ma2','kb3','3l Varies with device po']})
for i, v in enumerate(df['Size'].values):
v = v.replace('M', '')
v = v.replace('k', '')
v = v.replace('Varies with device', '')
df['Size'].values[i] = v
print(df)
Before :
A B Size
0 1 3 Mfoobar1
1 2 4 kfoobar2
2 1 3 Varies with devicefoobar3
After :
A B Size
0 1 3 foobar1
1 2 4 foobar2
2 1 3 foobar3
Upvotes: 1