Reputation: 21
So I recently concatenated multiple csv files into one. Since the filenames were dates, I also included "filename" as a column for reference. However, the filename has info that I would not like to include such as the time and file extension. As a beginner, I'm only familiar with importing and printing the file to view. What code is recommended to mass remove the info after the date?
answer filename
7 2018-04-12 21_01_01.csv
7 2018-04-18 18_36_30.csv
7 2018-04-18 21_01_32.csv
8 2018-04-20 15_21_02.csv
7 2018-04-20 21_00_44.csv
7 2018-04-22 21_01_05.csv
Upvotes: 0
Views: 319
Reputation: 3930
When working with tabular data in python I highly recommend using the pandas package.
import pandas as pd
df = pd.read_csv("../test_data.csv")
def rem_part(string):
return string.split(' ')[0] # could also split on '.' if you want to keep the time
df['date'] = df['filename'].apply(rem_part)
df.drop('filename', axis = 1, inplace=True) # remove the filename column if you so please
df.to_csv("output.csv"). # save the file as a new CSV or overwrite the old
The test_data.csv
file contains the following:
answer,filename
7,2018-04-12 21_01_01.csv
7,2018-04-18 18_36_30.csv
7,2018-04-18 21_01_32.csv
8,2018-04-20 15_21_02.csv
7,2018-04-20 21_00_44.csv
7,2018-04-22 21_01_05.csv
Upvotes: 0
Reputation: 71560
It could be done with regular python, not that difficult, but an very easy way with pandas would be:
import pandas as pd
df = pd.read_csv(<your name of the csv here>, sep='\s\s+', engine='python')
df['filename'] = df['filename'].str.rstrip('.csv')
print(df)
Upvotes: 1