Evan H.
Evan H.

Reputation: 149

How to iterate over columns in CSV to find which one contains rows of dates in Python (Pandas)

I have several CSV files that I need to read through inside of python3 using Pandas. I am trying to find which column (or index) of each CSV contains a column with dates. Two caveats:

  1. None of the CSV's have the same structure, some the dates column could be first, some it could be the second or third, and
  2. The dates are always in a different format, sometimes they are like "Oct 28", othertimes they are "7/11", and anything in between.

I have already began iterating through them using Pandas like so, and I can pretty quickly identify if a column contains dates using the datetime parse function. Im having trouble going column by column (through each header to check for words like "date") as well as through each row in that column to see if I can parse it as a date. Ideas?

Example CSV:

Table
DATE ,TOPIC 
Aug 21 ,"Some text here"
Aug 23 ,Some other text here, it isn't always in quotes 

Current code for iterating through the rows:

import pandas as pd

df = pd.read_csv(filePath)

for i, row in enumerate(df.itertuples(), 0):
   print(i, row.Index)

Edit: Here is my current code for iterating through each row in the Index column of a file. This works when the dates are in the index, but I need to loop through each column (not just the index) to see which one contains the dates.

for i, row in enumerate(df.itertuples(), 0):
   for keyword in dateKeywords:
       if keyword.lower() in row.Index.lower():
           foundColumn = True

       if foundColumn:
           for days in dayAndMonthList: 
                if days in row.Index:
                     canParse = is_date(row.Index)
                     print(canParse)

The above code loops through the Dataframe, and looks for keywords like "date" or "week" to see a potential dates column. If it finds one, it then goes through to see if the rows in that column all contain the name (or abbreviation) of a month). This works to parse dates in the index column, but how can I make it shift over and test each column for the keyword, and then loop through the rows in that column to check if they're parseable?

Upvotes: 2

Views: 388

Answers (1)

Arn
Arn

Reputation: 2015

Retrieving index/column pairs that refer to cells containing dates in a DataFrame:

# random setup
import pandas as pd
from dateutil.parser import parse
df = pd.DataFrame({
    "DATE": ['15/05/1965', '02/15/1994', 'Aug 23'],
    "MIX": ['ABC', 'DEF', '02/05/1995'],
    "TRICKERY": ['some chars', 'obfuscating 12/12/2000 the date', 'more chars'],
    "NAT": ['not', 'at', 'all']
})

# solution
def get_date(s):
    try:
        parse(s, fuzzy=True)
        return True
    except ValueError:
        return False

# get a boolean dataframe with True values indicating a cell is a date
df = df.applymap(get_date)

# get a list of tuples containing an index and a column name
parsable = df[df].stack().index.tolist()

>>> df
   DATE    MIX    NAT  TRICKERY
0  True  False  False     False
1  True  False  False      True
2  True   True  False     False

>>> parsable
[(0, 'DATE'), (1, 'DATE'), (1, 'TRICKERY'), (2, 'DATE'), (2, 'MIX')]

Upvotes: 1

Related Questions