CSV filtering and ascending order

Question

New to Python, so I need a bit of help.

I have a CSV file that has an id, created_at date, first/last name columns.

id  created_at  first_name last_name
1   1309380645  Cecelia    Holt
2   1237178109  Emma       Allison
3   1303585711  Desiree    King
4   1231175716  Sam        Davidson

I want to filter the rows between two dates lets say 03-22-2016 and 04-15-2016(dates don't really matter), and then order those rows in ascending order (by created_at)

I know this code will just show all or most of the data

import csv
from datetime import datetime

with open("sample_data.csv") as f:
    reader = csv.reader(f)
    for row in reader:
        print(" ".join(row[]))

But I'm not sure how to do the rest, or how to filter using this timestamp 1309380645 would using pandas be more beneficial for me, over using csv?

Any help is much appreciated or a guide/book to read for more understanding.

shish023 · Accepted Answer

I recommend using pandas since it will help you filter and perform further analysis faster.

# import pandas and datetime
import pandas as pd
import datetime

# read csv file
df = pd.read_csv("sample_data.csv")

# convert created_at from unix time to datetime
df['created_at'] = pd.to_datetime(df['created_at'], unit='s')

# contents of df at this point
#   id          created_at first_name last_name
# 0   1 2011-06-29 20:50:45    Cecelia      Holt
# 1   2 2009-03-16 04:35:09       Emma   Allison
# 2   3 2011-04-23 19:08:31    Desiree      King
# 3   4 2009-01-05 17:15:16        Sam  Davidson

# filtering example
df_filtered = df[(df['created_at'] <= datetime.date(2011,3,22))]

# output of df_filtered
#    id          created_at first_name last_name
# 1   2 2009-03-16 04:35:09       Emma   Allison
# 3   4 2009-01-05 17:15:16        Sam  Davidson

# filter based on dates mentioned in the question
df_filtered = df[(df['created_at'] >= datetime.date(2016,3,22)) & (df['created_at'] <= datetime.date(2016,4,15))]

# output of df_filtered would be empty at this point since the 
# dates are out of this range

# sort
df_sorted = df_filtered.sort_values(['created_at'])

Explanation of filtering in pandas:

First thing that you need to know is that using a comparison operator on a dataframe returns a dataframe with boolean values.

df['id'] > 2

Would return

False
False
 True
 True

Now, pandas supports logical indexing. So if you pass a dataframe with boolean values to pandas, if will return only the ones that correspond to True.

df[df['id'] > 2]

Returns

3   1303585711  Desiree    King
4   1231175716  Sam        Davidson

This is how you can filter easily in pandas

CSV filtering and ascending order

Answers (2)

Explanation of filtering in pandas:

Related Questions