Hasa
Hasa

Reputation: 145

Remove letters and signs from csv file - python 3.7

I got a CSV file with a column named activity which has data like:

instv2-02_00001_20190517235008
instv2 (9)
Insti2(3)
Fbstt1_00001_20190517131933

I need to remove numbers and any other signs (example: _) from the names in the 'activity' column only. That means need to keep just the letters. for example instv3-02_00001_20190517235157, instv1-02_00000_20190517234840, instv1 (4)...etc all need to be renamed/replaced as instv. How can I do this in a Python script?

Upvotes: 2

Views: 1843

Answers (2)

mohd4482
mohd4482

Reputation: 1918

Using pandas, load the CSV file and apply a regex replacement on the activity column values.

Try this code:

import re
import pandas as pd

df = pd.read_csv('your_file.csv')
df['activity'] = df['activity'].apply(lambda x: re.sub(r'^([a-zA-Z]+).*', r'\1', x))
df.to_csv('output.csv', index=False)

and if it is related to your question here, then you just need to import re and change the last line of the solution to be like:

import re

# ...

all_df['activity'] = all_df['activity'].apply(lambda x: re.sub(r'^([a-zA-Z]+).*', r'\1', x))
all_df.to_csv('all_data.csv', index=False)

Upvotes: 1

Masoud
Masoud

Reputation: 1280

Using re module and builtin open function:

import re

with open('Alldata10.csv', 'r') as file_r, open('Alldata10_revised.csv', 'w') as file_w:
    for line in file_r:
        line = line.split(',')
        line[0] = re.sub('[^a-zA-Z]+', '', line[0])
        line = ','.join(line)
        file_w.write(line)

Upvotes: 1

Related Questions