Python CSV search column and replace values with another with for loop

Question

I have an issue with my CSV file where I need to loop through all the string values in a specific columns and replace them with something else. I have tried it with pandas, but iteration makes a copy of my DataFrame and changes are not saved. Code I have tried so far is as follows:

import pandas as pd

df = pd.read_csv("cexport-4.csv", encoding="iso-8859-1", sep=";", error_bad_lines=False)

psu = ["PSU 500W - 550W", "PSU 600W - 650W"]

for row in df["KATEGOORIA"]:
    if row in psu:
        row = "PSU"

So what I need to figure out is how to take for example column row value such as "PSU 500W - 550W" and rename it as "PSU".

CSV file looks like this(part of a really large CSV file):

,AK ID,TOODE,KATEGOORIA
0,330783.0,ASUS VGA PCIE16 GT730 2GB GDDR3/GT730-SL-2G-BRK-V2 ASUS,GeForce 700 Series
1,330694.0,"Chipolo Plus 2nd Generation Smart Tracker CH-CPM6-BK-R Black, Bluetooth tracker, iOS and Android phones with Bluetooth 4.0 connectivity or higher, Weight 9 g, ,Smartdevice accessories
2,330653.0,"Thermaltake Smart 500W RGB (80+ 230V EU, 2xPEG, 120mm, Single Rail) PSU",PSU 500W - 550W

I have tried importing CSV and doing it that way but I do not know how to access only one column. Would really appreciate some or any guidance!

Best, Raidar

jezrael · Accepted Answer

Create dictionary and replace:

psu = ["PSU 500W - 550W", "PSU 600W - 650W"]

d = dict.fromkeys(psu, 'PSU')

df["KATEGOORIA"] = df["KATEGOORIA"].replace(d)

Alternative is use map with fillna for better performance:

df["KATEGOORIA"] = df["KATEGOORIA"].map(d).fillna(df["KATEGOORIA"])

Another approach should be replace all values starting with PSU with numpy.where:

mask = df["KATEGOORIA"].str.startswith('PSU')
#faster if no missing values
mask = [x.startswith('PSU') for x in df["KATEGOORIA"]]

df["KATEGOORIA"] = np.where(mask, 'PSU', df["KATEGOORIA"])

Python CSV search column and replace values with another with for loop

Answers (1)

Related Questions