Reputation: 149
I want to append to a csv file, some data from redshift tables, using the pandas
module in python. From python, I can successfully connect and retrieve rows from redshift tables using the psycopg2
module. Now, I am storing datewise data on the csv. So I need to first create a new date column in the csv, then append the data retrieved in that new column.
I am using the following commands to read from redshift tables:
conn=psycopg2.connect( host='my_db_hostname', port=`portnumber`, user='username', password='password', dbname='db')
conn.autocommit = True
cur=conn.cursor()
cur.execute(""" select emp_name, emp_login_count from public.emp_login_detail where login_date=current_date """)
records=cur.fetchall()
cur.close()
Now, I want to append these emp_name and emp_login_count columns to the existing csv. Below is a snapshot of csv:
Everyday I need to add new date column in csv and then I need to put the emp_login_count against respective person's name.
I am new to Pandas and have no idea how to implement this. Can someone please help me out?
Upvotes: 0
Views: 2545
Reputation: 70
Add the following and try it out:
records=cur.fetchall()
# Create a dataframe of the SQL query's result
column_names = ['emp_name','login_count']
df = pd.DataFrame(records, columns = column_names)
df.head()
Now create another dataframe for the daily login counts csv file
df_daily = pd.read_csv('<INSERT the path_to_csv here>')
df_daily.head()
Merge the two dataframes on the 'emp_name' column
result = df.merge(df_daily, on='emp_name')
result.head()
After the join, you can rename the 'login_count' column to today's date
result.rename(columns = {'login_count':'< INSERT date here>'}, inplace = True)
You can then save the new data into a csv file again:
pd.to_csv('<INSERT name of file.csv>', index=False)
Upvotes: 2
Reputation: 31
I'm considering that everyday u'll get 1 column with the emp_name data, and other column with the emp_login_count data.
I belive u need to look at:
1 - Read the csv file with the read_csv in pandas, because this will give u a dataframe with the history information.
2 - Everyday u must create a new dataframe, with the information got in the database. If u want, u may use the datetime library and today method to get the todays date and use as a columns title.
3 - Merge the information from the csv with the new data, using merge method in pandas, u'll probably need something like: df_original.merge(df_new_information, left_on='emp_name', right_on='emp_name', how='outer')
. The left_on and right_on are the columns names that u'll use as reference.
4 - Write a csv file with the to_csv in pandas, this will create a new or replace the existing csv file.
Sorry for the bad english.
Upvotes: 2