Reputation: 1387
I'm using Pandas Series to chart data from vmstat output. After loading the data, the dataframe looks good:
vmstat_filename = 'vmstat0607.csv'
df = pd.read_csv(vmstat_filename, sep=',')
df
date_time free_mem block_ins block_outs interupts context_switches cpu_user cpu_idle
0 2021-06-07 00:00:02 14068616 0 57 770 1022 0 99
1 2021-06-07 00:00:22 14003300 0 22 887 1095 0 99
2 2021-06-07 00:00:42 14064280 0 23 882 1051 0 99
3 2021-06-07 00:01:02 14020436 0 100 922 1085 1 98
4 2021-06-07 00:01:22 14002080 0 21 942 1179 1 99
... ... ... ... ... ... ... ... ...
4300 2021-06-07 23:58:35 9361208 0 19 1029 1161 1 99
4301 2021-06-07 23:58:55 9361524 0 56 1029 1181 1 99
4302 2021-06-07 23:59:15 9419520 0 312 1364 1291 1 98
4303 2021-06-07 23:59:35 9363812 0 24 1417 1131 3 96
4304 2021-06-07 23:59:55 9354404 0 69 1011 1176 1 99
When I extract free_mem column from the df and use date_time as index, I got all free_mem data displayed as NaN:
ts = pd.Series(df['free_mem'], index=df['date_time'])
ts
date_time
2021-06-07 00:00:02 NaN
2021-06-07 00:00:22 NaN
2021-06-07 00:00:42 NaN
2021-06-07 00:01:02 NaN
2021-06-07 00:01:22 NaN
..
2021-06-07 23:58:35 NaN
2021-06-07 23:58:55 NaN
2021-06-07 23:59:15 NaN
2021-06-07 23:59:35 NaN
2021-06-07 23:59:55 NaN
Name: free_mem, Length: 4305, dtype: float64
Why did that happen?
Upvotes: 0
Views: 78
Reputation: 11395
The problem is that the series index is not (yet) date_time
.
You’re doing (the equivalent of) the following:
df['free_mem'].reindex(df['date_time'])
Instead, try first setting the date_time
as the index of the whole dataframe, and then getting the column:
df.set_index('date_time')['free_mem']
Upvotes: 1
Reputation: 14949
It's using the values from index=df['date_time']
as index
to fetch the values from your original dataframe.
Fix -> use .values
to access the inner numpy array:
s = pd.Series(data = df['free_mem'].values, index = df['date_time'])
Upvotes: 1