techie11
techie11

Reputation: 1387

Pandas Series integer data becomes NaN

I'm using Pandas Series to chart data from vmstat output. After loading the data, the dataframe looks good:

vmstat_filename = 'vmstat0607.csv'
df = pd.read_csv(vmstat_filename, sep=',')
df


date_time   free_mem    block_ins   block_outs  interupts   context_switches    cpu_user    cpu_idle
0   2021-06-07 00:00:02 14068616    0   57  770 1022    0   99
1   2021-06-07 00:00:22 14003300    0   22  887 1095    0   99
2   2021-06-07 00:00:42 14064280    0   23  882 1051    0   99
3   2021-06-07 00:01:02 14020436    0   100 922 1085    1   98
4   2021-06-07 00:01:22 14002080    0   21  942 1179    1   99
... ... ... ... ... ... ... ... ...
4300    2021-06-07 23:58:35 9361208 0   19  1029    1161    1   99
4301    2021-06-07 23:58:55 9361524 0   56  1029    1181    1   99
4302    2021-06-07 23:59:15 9419520 0   312 1364    1291    1   98
4303    2021-06-07 23:59:35 9363812 0   24  1417    1131    3   96
4304    2021-06-07 23:59:55 9354404 0   69  1011    1176    1   99

When I extract free_mem column from the df and use date_time as index, I got all free_mem data displayed as NaN:

ts = pd.Series(df['free_mem'], index=df['date_time'])
ts

date_time
2021-06-07 00:00:02   NaN
2021-06-07 00:00:22   NaN
2021-06-07 00:00:42   NaN
2021-06-07 00:01:02   NaN
2021-06-07 00:01:22   NaN
                       ..
2021-06-07 23:58:35   NaN
2021-06-07 23:58:55   NaN
2021-06-07 23:59:15   NaN
2021-06-07 23:59:35   NaN
2021-06-07 23:59:55   NaN
Name: free_mem, Length: 4305, dtype: float64

Why did that happen?

Upvotes: 0

Views: 78

Answers (2)

Cimbali
Cimbali

Reputation: 11395

The problem is that the series index is not (yet) date_time.

You’re doing (the equivalent of) the following:

df['free_mem'].reindex(df['date_time'])

Instead, try first setting the date_time as the index of the whole dataframe, and then getting the column:

df.set_index('date_time')['free_mem']

Upvotes: 1

Nk03
Nk03

Reputation: 14949

It's using the values from index=df['date_time'] as index to fetch the values from your original dataframe.

Fix -> use .values to access the inner numpy array:

s = pd.Series(data = df['free_mem'].values, index = df['date_time'])

Upvotes: 1

Related Questions