Reputation: 105
I have to join two tables and create a table with dates, but my code is way to long and I believe that I done it the super long way.Apparently the soulution to this only had 22 lines. Is there another way and more shorter way to approach this problem. Here is the question
HERE IS MY CODE, and again I believe it is to long and I think there is a shorter way to do this.
import numpy as np
import pandas as pd
import datetime
#YOUR CODE GOES HERE#
def get_month(i):
"""this function returns the number of the month based on stringinput"""
if i == "January":
return 1
elif i == "February":
return 2
elif i == "March":
return 3
elif i == "April":
return 4
elif i == "May":
return 5
elif i == "June":
return 6
elif i == "July":
return 7
elif i == "August":
return 8
elif i == "September":
return 9
elif i == "October":
return 10
elif i == "November":
return 11
elif i == "December":
return 12
def get_reformatted_date(s):
"""this function reformats a datetime object to the output we're looking for"""
return s.strftime("%d-%b-%y")
month_names = []
tab1 = pd.read_csv("data1.csv")
tab2 = pd.read_csv("data2.csv")
tab1_tweets = tab1['Tweet'].tolist()[::-1]
tab2_tweets = tab2['Tweet'].tolist()[::-1]
tab1_months = tab1['Month'].tolist()[::-1]
tab2_months = tab2['Month'].tolist()[::-1]
tab1_days = tab1['Day'].tolist()[::-1]
tab2_days = tab2['Day'].tolist()[::-1]
tab1_years = tab1['Year'].tolist()[::-1]
tab2_years = tab2['Year'].tolist()[::-1]
all_dates = []
all_tweets = []
tab1_count = 0
tab2_count = 0
for i in range(len(tab1_tweets) + len(tab2_tweets)):
if(tab1_count < len(tab1_years) and tab2_count < len(tab2_years)):
t1_date = datetime.date(tab1_years[tab1_count], tab1_months[tab1_count], tab1_days[tab1_count])
t2_date = datetime.date(tab2_years[tab2_count], get_month(tab2_months[tab2_count]), tab2_days[tab2_count])
if t1_date > t2_date:
all_dates.append(t1_date)
all_tweets.append(tab1_tweets[tab1_count])
tab1_count += 1
else:
all_dates.append(t2_date)
all_tweets.append(tab2_tweets[tab2_count])
tab2_count += 1
elif(tab2_count < len(tab2_years)):
t2_date = datetime.date(tab2_years[tab2_count], get_month(tab2_months[tab2_count]), tab2_days[tab2_count])
all_dates.append(t2_date)
all_tweets.append(tab2_tweets[tab2_count])
tab2_count += 1
else:
t1_date = datetime.date(tab1_years[tab1_count], tab1_months[tab1_count], tab1_days[tab1_count])
all_dates.append(t1_date)
all_tweets.append(tab1_tweets[tab1_count])
tab1_count += 1
table_data = {'Date': all_dates, 'Tweet': all_tweets}
df = pd.DataFrame(table_data)
df['Date'] = df['Date'].apply(get_reformatted_date)
print(df)
data1.csv
is
Tweet Month Day Year
Hello World 6 2 2013
I want ice-cream! 7 23 2013
Friends will be friends 9 30 2017
Done with school 12 12 2017
the data2.csv
is
Month Day Year Hour Tweet
January 2 2015 12 Happy New Year
March 21 2016 7 Today is my final
May 30 2017 23 Summer is about to begin
July 15 2018 11 Ocean is still cold
Upvotes: 2
Views: 2308
Reputation: 51395
I think that you can theoretically do this whole thing in one line:
finaldf = (pd.concat([pd.read_csv('data1.csv',
parse_dates={'Date':['Year', 'Month', 'Day']}),
pd.read_csv('data2.csv',
parse_dates={'Date':['Year', 'Month', 'Day']})
[['Date', 'Tweet']]])
.sort_values('Date', ascending=False))
But for the sake of readability, its better to split it into a few lines:
df1 = pd.read_csv('data1.csv', parse_dates={'Date':['Year', 'Month','Day']})
df2 = pd.read_csv('data2.csv', parse_dates={'Date':['Year', 'Month','Day']})
finaldf = (pd.concat([df1, df2[['Date', 'Tweet']]])
.sort_values('Date', ascending=False))
I think that for what you're trying to do, the main things to read up about are the parse_dates
argument of pandas read_csv
, and pd.concat
to concatenate dataframes
Edit: in order to get the dates in the correct format as you have in your example output, you can call this after the code above, using Series.dt.strftime()
:
finaldf['Date'] = finaldf['Date'].dt.strftime('%d-%b-%y')
Upvotes: 1