caricature
caricature

Reputation: 177

Python Pandas: Dataframe is not updating using string methods

I'm trying to update the strings in a .csv file that I am reading using Pandas. The .csv contains the column name 'about' which contains the rows of data I want to manipulate.

I've already used str. to update but it is not reflecting in the exported .csv file. Some of my code can be seen below.

import pandas as pd

df = pd.read_csv('data.csv')
df.About.str.lower() #About is the column I am trying to update
df.About.str.replace('[^a-zA-Z ]', '')
df.to_csv('newdata.csv')

Upvotes: 0

Views: 763

Answers (3)

Karn Kumar
Karn Kumar

Reputation: 8816

Example Dataframe:

>>> df
        About
0      JOHN23
1     PINKO22
2   MERRY jen
3  Soojan San
4      Remo55

Solution:,another way Using a compiled regex with flags

>>> df.About.str.lower().str.replace(regex_pat,  '')
0          john
1         pinko
2     merry jen
3    soojan san
4          remo
Name: About, dtype: object

Explanation:

Match a single character not present in the list below [^a-z]+

+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy) a-z a single character in the range between a (index 97) and z (index 122) (case sensitive)

$ asserts position at the end of a line

Upvotes: 1

jezrael
jezrael

Reputation: 862841

You need assign output to column, also is possible chain both operation together, because working with same column About and because values are converted to lowercase, is possible change regex to replace not uppercase:

df = pd.read_csv('data.csv')
df.About = df.About.str.lower().str.replace('[^a-z ]', '')
df.to_csv('newdata.csv', index=False)

Sample:

df = pd.DataFrame({'About':['AaSD14%', 'SDD Aa']})

df.About = df.About.str.lower().str.replace('[^a-z ]', '')
print (df)
    About
0    aasd
1  sdd aa

Upvotes: 1

DirtyBit
DirtyBit

Reputation: 16782

import pandas as pd
import numpy as np

columns = ['About']
data = ["ALPHA","OMEGA","ALpHOmGA"]
df = pd.DataFrame(data, columns=columns)
df.About = df.About.str.lower().str.replace('[^a-zA-Z ]', '')
print(df)

OUTPUT:

out

Upvotes: 1

Related Questions