jeleung2
jeleung2

Reputation: 21

How to mass delete unwanted info in column in a CSV file?

So I recently concatenated multiple csv files into one. Since the filenames were dates, I also included "filename" as a column for reference. However, the filename has info that I would not like to include such as the time and file extension. As a beginner, I'm only familiar with importing and printing the file to view. What code is recommended to mass remove the info after the date?

answer  filename
7       2018-04-12 21_01_01.csv
7       2018-04-18 18_36_30.csv
7       2018-04-18 21_01_32.csv
8       2018-04-20 15_21_02.csv
7       2018-04-20 21_00_44.csv
7       2018-04-22 21_01_05.csv

Upvotes: 0

Views: 319

Answers (2)

Jurgen Strydom
Jurgen Strydom

Reputation: 3930

When working with tabular data in python I highly recommend using the pandas package.

import pandas as pd
df = pd.read_csv("../test_data.csv")
def rem_part(string):
    return string.split(' ')[0]  # could also split on '.' if you want to keep the time
df['date'] = df['filename'].apply(rem_part)
df.drop('filename', axis = 1, inplace=True)  # remove the filename column if you so please
df.to_csv("output.csv"). # save the file as a new CSV or overwrite the old

The test_data.csv file contains the following:

answer,filename
7,2018-04-12 21_01_01.csv
7,2018-04-18 18_36_30.csv
7,2018-04-18 21_01_32.csv
8,2018-04-20 15_21_02.csv
7,2018-04-20 21_00_44.csv
7,2018-04-22 21_01_05.csv

Upvotes: 0

U13-Forward
U13-Forward

Reputation: 71560

It could be done with regular python, not that difficult, but an very easy way with pandas would be:

import pandas as pd
df = pd.read_csv(<your name of the csv here>, sep='\s\s+', engine='python')
df['filename'] = df['filename'].str.rstrip('.csv')
print(df)

Upvotes: 1

Related Questions