Reputation: 145
I got a CSV file with a column named activity
which has data like:
instv2-02_00001_20190517235008
instv2 (9)
Insti2(3)
Fbstt1_00001_20190517131933
I need to remove numbers and any other signs (example: _) from the names in the 'activity' column only.
That means need to keep just the letters.
for example instv3-02_00001_20190517235157
,
instv1-02_00000_20190517234840
, instv1 (4)...etc all need to be renamed/replaced as instv. How can I do this in a Python script?
Upvotes: 2
Views: 1843
Reputation: 1918
Using pandas
, load the CSV file and apply a regex replacement on the activity
column values.
Try this code:
import re
import pandas as pd
df = pd.read_csv('your_file.csv')
df['activity'] = df['activity'].apply(lambda x: re.sub(r'^([a-zA-Z]+).*', r'\1', x))
df.to_csv('output.csv', index=False)
and if it is related to your question here, then you just need to import re and change the last line of the solution to be like:
import re
# ...
all_df['activity'] = all_df['activity'].apply(lambda x: re.sub(r'^([a-zA-Z]+).*', r'\1', x))
all_df.to_csv('all_data.csv', index=False)
Upvotes: 1
Reputation: 1280
Using re
module and builtin open
function:
import re
with open('Alldata10.csv', 'r') as file_r, open('Alldata10_revised.csv', 'w') as file_w:
for line in file_r:
line = line.split(',')
line[0] = re.sub('[^a-zA-Z]+', '', line[0])
line = ','.join(line)
file_w.write(line)
Upvotes: 1