John
John

Reputation: 13

How to get specific information from a csv file using pandas?

I want to extract specific information from [this csv file][1].

I need make a list of days and give an overview.

Upvotes: 0

Views: 119

Answers (3)

Alex
Alex

Reputation: 1

I need make a list of days with lowest visibility and give an overview of other time parameters for those days in tabular form.

Since your Date/Time column represents a particular hour, you'll need to do some grouping to get the minimum visibility for a particular day. The following will find the 5 least-visible days.

# Extract the date from the "Date/Time" column
>>> data["Date"] = pandas.to_datetime(data["Date/Time"]).dt.date

# Group on the new "Date" column and get the minimum values of
# each column for each group.
>>> min_by_day = data.groupby("Date").min()

# Now we can use nsmallest, since 1 row == 1 day in min_by_day.
# Since `nsmallest` returns a pandas.Series with "Date" as the index,
# we have to use `.index` to pull the date objects from the result.
>>> least_visible_days = min_by_day.nsmallest(5, "Visibility (km)").index

Then you can limit your original dataset to the least-visible days with

data[data["Date"].isin(least_visible_days)]

I also need the total number of foggy days.

We can use the extracted date in this case too:

# Extract the date from the "Date/Time" column
>>> data["Date"] = pandas.to_datetime(data["Date/Time"]).dt.date

# Filter on hours which have foggy weather
>>> foggy = data[data["Weather"].str.contains("Fog")]

# Count number of unique days
>>> len(foggy["Date"].unique())

Upvotes: 0

user17242583
user17242583

Reputation:

You're looking for DataFrame.resample. Based on a specific column, it will group the rows of the dataframe by a specific time interval.

First you need to do this, if you haven't already:

data['Date/Time'] = pd.to_datetime(data['Date/Time'])

Get the lowest 5 days of visibility:

>>> df.resample(rule='D', on='Date/Time')['Visibility (km)'].mean().nsmallest(5)
Date/Time
2012-03-01    2.791667
2012-03-14    5.350000
2012-12-27    6.104167
2012-01-17    6.433333
2012-02-01    6.795833
Name: Visibility (km), dtype: float64

Basically what that does is this:

  1. Groups all the rows by day
  2. Converts each group to the average value of all the Visibility (km) items for that day
  3. Returns the 5 smallest

Count the number of foggy days

>>> df.resample(rule='D', on='Date/Time').apply(lambda x: x['Weather'].str.contains('Fog').any()).sum()
78

Basically what that does is this:

  1. Groups all the rows by day
  2. For each day, adds a True if any row inside that day contains 'Fog' in the Weather column, False otherwise
  3. Counts how many True's there were, and thus the number of foggy days.

Upvotes: 1

amjad khatabi
amjad khatabi

Reputation: 106

This will get you an array of all unique foggy days. you can use the shape method to get its dimension

 df[df["Weather"].apply(lambda x : "Fog" in x)]["Date/Time"].unique()

Upvotes: 0

Related Questions