Reputation: 469
I already asked the same question, and it looked to be unclear.So let me ask it in different way.I have four .csv files named as I_earthquake2016.csv I_earthquake2017.csv I_earthquake2018.csv I_earthquake2019.csv (earthquake data in different years) They all have the same columns just the number of rows is different. I made some codes to read one of the files, and make the histogram to see how many earthquakes happen each month.
Questions:
Can anybody please teach me how to it. thank you.
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
data = pd.read_csv('I_earthquake2017.csv')
print(data[:1])
Output line1:
time latitude longitude depth mag
0 2017-12-30 20:53:24.700000+00:00 29.4481 51.9793 10.0 4.9
data['time']=pd.to_datetime(data['time'])
data['MONTH']=data['time'].dt.month
data['YEAR']=data['time'].dt.year
print(data[:1])
Output Line 1
time latitude longitude depth mag MONTH YEAR
0 2017-12-30 20:53:24.700000+00:00 29.4481 51.9793 10.0 4.9 12 2017
plt.hist(x=[data.MONTH],bins=12,alpha=0.5)
plt.show()
Upvotes: 2
Views: 2030
Reputation: 1822
EDIT: Included a sorted in the assignment of csv_list to rearrange the subplots in the right order
changed line -> csv_list = sorted(list(base_dir.glob("*.csv")))
so I simulated your data (for those interested the code for simulation is the last part of this answer)
Necessary imports for the code
#!/usr/bin/env python3
import calendar
from pathlib import Path
import matplotlib.pyplot as plt
import pandas as pd
There is the library glob, however I prefer the built-in pathlib implementation of glob. Both allow you to search for a regex pattern (like *.csv), see below quote from the docs:
Glob the given relative pattern in the directory represented by this path, yielding all matching files (of any kind)
The code below gives you a list of pandas DataFrame. The argument parse_dates=['time']
automatically convers the column time to a datetime. So you don't need pd.to_datetime()
anymore. You will need to adapt the base in base_dir
to match the correct directory on your pc.
# Read in mulitple CSV Files
base_dir = Path("C:/Test/Earthquake-Data")
csv_list = sorted(list(base_dir.glob("*.csv")))
df_list = [pd.read_csv(file, index_col=0,parse_dates=['time']) for file in csv_list]
You can create a 2 x 2 subplot with plt.subplots()
in the code below I iterate over the list of dataframes together with the list of axes with zip(df_list,fig.get_axes())
and unpack them the resulting tuple of *(df, axes) in the to variables df
and ax
. In the loop I use the vectorized .dt.month
on the time column to create the histogram and change some of the appearance parameters, i.e.:
title=str(df['time'].dt.year[0])
list(calendar.month_abbr[1:])
). Please recognized that I import calendar
in the first part of my answer (above).Code:
fig, ax = plt.subplots(2,2)
for df, ax in zip(df_list,fig.get_axes()):
df['time'].dt.month.plot(kind="hist",ax=ax,bins=12,title=str(df['time'].dt.year[0]))
ax.set_xticks(range(1,13))
ax.set_xticklabels(list(calendar.month_abbr[1:]))
# Rotate the xticks for increased readability
for tick in ax.get_xticklabels():
tick.set_rotation(45)
fig.tight_layout()
plt.show()
#!/usr/bin/env python3
import numpy as np
import pandas as pd
from my_utils.advDateTime import random_datetimes
from pathlib import Path
year_range = range(2016,2020)
time = [random_datetimes(pd.to_datetime(f"1/1/{year}"), pd.to_datetime(f"1/1/{year + 1}"), n=100) \
for year in year_range]
lattitude = [np.random.randint(0,100,100) for i in range(4)]
data = {'Lattitude': lattitude[0],'time':time[0]}
list_dfs = [pd.DataFrame({'Lattitude': data,'time':y}).sort_values("time").reset_index(drop=True) for data,y in zip(lattitude,time)]
# # Export to CSV
base_dir = Path("C:/Test/Earthquake-Data")
[df.to_csv(base_dir/f"I_earthquake{year}.csv") for df,year in zip(list_dfs,year_range)]
Upvotes: 1