Reputation: 51
What I have got is a CSV file with following structure:
column1 column2 column3 column4(day) column5(month&year) column6(time)
column1 column2 column3 column4(day) column5(month&year) column6(time)
column1 column2 column3 column4(day) column5(month&year) column6(time)
...
The columns of the file do not have names. Now I want to merge column4 and column5 and generate a new version of the CSV file so that I have got the complete date in one cell.
What I have tried is following Python code:
def correctDatetime():
with open("inputfile.csv", "r") as source, open("outputfile.csv", "w") as result:
df = pd.read_csv('C:/ProgrammingProjects/LicMonTest/inputfile.csv', header=0)
for row in source:
df['Datetime'] = df[df.columns[3:]].apply(lambda x: ' '.join(x.dropna().astype(str)), axis=1)
result.write(df)
Unfortunately this only generates an empty CSV file. How could I best approach this issue? Any advice would be helpful. Thanks in advance!
Upvotes: 0
Views: 91
Reputation: 862406
Add header=None
because no header in csv file to read_csv
, then extract column 3
and 4
by DataFrame.pop
and join together, last write to file by DataFrame.to_csv
:
def correctDatetime():
df = pd.read_csv('C:/ProgrammingProjects/LicMonTest/inputfile.csv', header=None)
df['Datetime'] = df.pop(3) + ' ' + df.pop(4) + ' ' + df.pop(5)
df.to_csv("outputfile.csv", index=False, header=False)
If need convert to datetimes and format of month
with years is MMYYYY
and time column is HH:MM:SS
add to_datetime
:
def correctDatetime():
df = pd.read_csv('C:/ProgrammingProjects/LicMonTest/inputfile.csv', header=None)
s = df.pop(3) + '-' + df.pop(4) + ' ' + df.pop(4)
df['Datetime'] = pd.to_datetime(s, format='%d-%m%Y %H:%M:%S')
df.to_csv("outputfile.csv", index=False, header=False)
Upvotes: 0
Reputation: 733
import pandas as pd
#Setup DF
data = [
(1234, 1, 'x', 'x', 34, 'May 2019'),
(1234, 2, 'x', 'x', 34, 'June 2019'),
(1235, 1, 'y', 'y', 34, 'July 2019')]
df = pd.DataFrame(data)
The following code will combine your data based on column position and then drop unnecessary columns and save to some_csv.csv
:
df['Datetime']=df.apply(lambda x:'%s %s' % (str(x[df.columns[4]]),str(x[df.columns[5]])),axis=1)
df.drop(columns=[df.columns[4],df.columns[5]], inplace=True)
df.to_csv('some_csv.csv')
Upvotes: 0