Reputation: 2617
I have a number of files that look like this.
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1997 1.840% -0.680% 0.480% 1.550% 1.510% 1.750% 2.630% -0.190% 2.960% 2.180% 0.610% 0.710%
1998 -0.470% 1.270% 2.130% 1.200% 0.880% 1.790% -0.800% -1.000% 1.080% 0.480% 0.710% 2.930%
Is there any way to convert files like this cleanly into a pandas Series?
Upvotes: 0
Views: 714
Reputation: 2617
currently, my solution is this, but it feels pretty adhoc.
df = pd.read_excel("file.xls")
df = df.set_index("year")
frame = df.stack()
new_index = []
for item in frame.index:
ts = pd.Timestamp(item[1] +"-1-"+ str(int(item[0])))
new_index.append(ts)
frame.index = new_index
Upvotes: 0
Reputation: 485
I'm not sure whether your question includes parsing the files or not, so here it goes:
First, we parse the (csv) file, making sure to specify that it is whitespace-delimited:
df = pd.read_csv('data.csv', delim_whitespace=True)
delim_whitespace
is nicer than sep=" ", because it interprets any number of successive whitespaces as a single delimeter.
Then, we melt
the dataframe to merge the rows and columns together (i.e. 'Jan' column and '1997' row become a single 'Jan 1997' row with the correct percentage value).
df = pd.melt(df, id_vars=["Year"], var_name="Month", value_name = "Percentage")
Now, we do some cleaning up: merging the 'Month' and 'Year' columns together, dropping the 'Year' column, parsing the strings as datetime and sorting by date.
df['Month'] = df.Month + " " + df.Year.map(str)
df = df.drop('Year', axis=1)
df["Month"] = pd.to_datetime(df.Month, format="%b %Y", dayfirst=True)
df = df.sort("Month")
df = df.set_index("Month")
Finally, we can convert our DataFrame into a Series:
series = df.ix[:,0]
The final result gives us the following Series:
Month
1997-01-01 1.840%
1997-02-01 -0.680%
1997-03-01 0.480%
...
1998-10-01 0.480%
1998-11-01 0.710%
1998-12-01 2.930%
Name: Percentage, dtype: object
Hope this helps!
Upvotes: 3