Ayush Tiwari
Ayush Tiwari

Reputation: 59

How to modify CSV format data using pandas in Jupyter notebook?

I am reading a CSV file into variable called 'data' as follows in Jupyter Notebook using pandas


import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt data = pd.read_csv("C:/Users/hp/Desktop/dv project/googleplaystorecleaned.csv")


I tried to modify the 'Size' column of the data set to remove the character 'M' and 'k' using the following code

for i in range(len(data['Size'])):
    data['Size'][i]=str(data['Size'][i])
    data['Size'][i]=data['Size'][i].replace('M','')
    data['Size'][i]=data['Size'][i].replace('k','')
    data['Size'][i]=data['Size'][i].replace('Varies with device','')
    data['Size'][i]=float(data['Size'][i])
print(data['Size']) 

The code seems to work only partially on the data set as i am getting the following output

0                        19
1                        14
2                       8.7
3                        25
4                       2.8
                ...        
10836                   53M
10837                  3.6M
10838                  9.5M
10839    Varies with device
10840                   19M
Name: Size, Length: 10829, dtype: object

Please tell a proper way to do so.

Upvotes: 1

Views: 3187

Answers (2)

The Guy
The Guy

Reputation: 421

Hi you can also try this:

import pandas as pd
list1= ['20M','9M','10K','10']
dataframe1=pd.DataFrame(data=list1,columns=['Size'])

for i, s in enumerate(dataframe1['Size']):
    if s[len(s)-1]=='M':
        dataframe1['Col1'][i]=dataframe1['Size'][i].replace('M',"")
    if s[len(s)-1]=='K':
        dataframe1['Col1'][i]=dataframe1['Size'][i].replace('K',"")


dataframe1

You will get your expected output.

Note: You can add if condition according to your requirements

Upvotes: 1

Phoenixo
Phoenixo

Reputation: 2123

I created an example dataframe to show the result :

df = pd.DataFrame({'A': [1,2,1], 'B': [3,4,3], 'Size': ['Ma2','kb3','3l Varies with device po']})
for i, v in enumerate(df['Size'].values):
    v = v.replace('M', '')
    v = v.replace('k', '')
    v = v.replace('Varies with device', '')
    df['Size'].values[i] = v
print(df)

Before :

    A   B   Size
0   1   3   Mfoobar1
1   2   4   kfoobar2
2   1   3   Varies with devicefoobar3

After :

    A   B   Size
0   1   3   foobar1
1   2   4   foobar2
2   1   3   foobar3

Upvotes: 1

Related Questions