Remove letters and signs from csv file - python 3.7

Question

I got a CSV file with a column named activity which has data like:

instv2-02_00001_20190517235008
instv2 (9)
Insti2(3)
Fbstt1_00001_20190517131933

I need to remove numbers and any other signs (example: _) from the names in the 'activity' column only. That means need to keep just the letters. for example instv3-02_00001_20190517235157, instv1-02_00000_20190517234840, instv1 (4)...etc all need to be renamed/replaced as instv. How can I do this in a Python script?

mohd4482 · Accepted Answer

Using pandas, load the CSV file and apply a regex replacement on the activity column values.

Try this code:

import re
import pandas as pd

df = pd.read_csv('your_file.csv')
df['activity'] = df['activity'].apply(lambda x: re.sub(r'^([a-zA-Z]+).*', r'\1', x))
df.to_csv('output.csv', index=False)

and if it is related to your question here, then you just need to import re and change the last line of the solution to be like:

import re

# ...

all_df['activity'] = all_df['activity'].apply(lambda x: re.sub(r'^([a-zA-Z]+).*', r'\1', x))
all_df.to_csv('all_data.csv', index=False)

Remove letters and signs from csv file - python 3.7

Answers (2)

Related Questions