Reputation: 1896
I currently have a dataframe which I scraped from the internet using Beautiful Soup. However it is setup so that it is gridded, rather then a continuous list. As in Months for rows, and Years for Columns.
However I am trying to make it so that it is one continuous column as this data will be plotted against other data, aka births vs deaths.
An example of the df I currently have is as below,
2010 2011 2013 2014
Jan 1.474071 -0.064034 0.781836 -1.282782
Feb -1.071357 0.441153 0.583787 2.353925
Mar 0.221471 -0.744471 1.729689 0.758527
Apr -0.964980 -0.845696 1.846883 -1.340896
May -1.328865 1.682706 0.888782 -1.717693
Jun 0.228440 0.901805 0.520260 1.171216
Jul -1.197071 -1.066969 -0.858447 -0.303421
Aug 0.306996 -0.028665 1.574159 0.384316
Sep -0.014805 -0.284319 -1.461665 0.650776
Oct 1.588931 0.476720 -0.242861 0.473424
Nov -0.014805 -0.284319 -1.461665 0.650776
Dec 0.964980 -0.845696 1.846883 -1.340896
However when I try append (with ignore index) I get
df[["2010"]].append(df[["2011"]], ignore_index=True)
00 1.474071 NaN
01 -1.071357 NaN
02 0.221471 NaN
03 -0.964980 NaN
04 -1.328865 NaN
05 0.228440 NaN
06 -1.197071 NaN
07 0.306996 NaN
08 -0.014805 NaN
09 1.588931 NaN
11 -0.014805 NaN
12 NaN -0.064034
13 NaN 0.441153
14 NaN -0.744471
15 NaN -0.845696
16 NaN 1.682706
However I am trying to get the whole dataset into one continuous column, e.g.
00 1.474071
01 -1.071357
02 0.221471
03 -0.964980
04 -1.328865
05 0.228440
06 -1.197071
07 0.306996
08 -0.014805
09 1.588931
11 -0.014805
12 -0.064034
13 0.441153
14 -0.744471
15 -0.845696
16 1.682706
How do I get all four columns into one single column?
Upvotes: 3
Views: 5115
Reputation: 2330
Another way to do this is to unstack
the DataFrame
. Then reset the index to the default integer index with reset_index(drop=True)
:
df.unstack().reset_index(drop=True)
Upvotes: 6
Reputation: 394389
You can create a list of the cols, and call squeeze
to anonymise the data so it doesn't try to align on columns, and then call concat
on this list, passing ignore_index=True
creates a new index, otherwise you'll get the month names as index values repeated:
In [228]:
cols = [df[col].squeeze() for col in df]
pd.concat(cols, ignore_index=True)
Out[228]:
0 1.474071
1 -1.071357
2 0.221471
3 -0.964980
4 -1.328865
5 0.228440
6 -1.197071
7 0.306996
8 -0.014805
9 1.588931
10 -0.014805
11 0.964980
12 -0.064034
13 0.441153
14 -0.744471
15 -0.845696
16 1.682706
17 0.901805
18 -1.066969
19 -0.028665
20 -0.284319
21 0.476720
22 -0.284319
23 -0.845696
24 0.781836
25 0.583787
26 1.729689
27 1.846883
28 0.888782
29 0.520260
30 -0.858447
31 1.574159
32 -1.461665
33 -0.242861
34 -1.461665
35 1.846883
36 -1.282782
37 2.353925
38 0.758527
39 -1.340896
40 -1.717693
41 1.171216
42 -0.303421
43 0.384316
44 0.650776
45 0.473424
46 0.650776
47 -1.340896
dtype: float64
Upvotes: 2