How to splitting a dataframe into parts by specific values of a column?

Question

I have a pandas dataframe matrix that looks like this:

  Store Sales   year  month day
0   1   5263    2015    7   31
1   1   5020    2015    7   30
2   1   4782    2015    7   29
3   2   5011    2015    8   28
4   2   6102    2015    9   27
[986159 rows x 5 columns]

I need to split the data up into by removing the month where the value is 8 and 9. Then I need the rest of the training data to be in the other set.

I could do it like this but it doesn't work:

# Dataframe with 8 and 9 months
train_X1 = train[train['month'] == 9 or train['month'] == 8]
# The rest of the data
train_X2 = train[train['month'] != 9 or train['month'] != 8]

I could do this but that only gets me one part of the data with 8 and 9 month but the rest isn't captured:

train8 = train[train['month'] == 8]
train9 = train[train['month'] == 9]
train89 = train8 + train9

How do I split dataframe the into 2 parts where one of it has specific values without splitting it twice? (maybe with dataframe.query() or pandas.train_test_split()?)

Jonah Williams · Accepted Answer

The syntax of the operation is incorrect, replace the above split with the following. You also need to wrap each predicate in parens and use '|' (or) and '&' (and). This will perform the appropriate splits.

train_X1 = train[(train['month'] == 9) | (train['month'] == 8)]
train_X2 = train[(train['month'] != 9) & (train['month'] != 8)]

How to splitting a dataframe into parts by specific values of a column?

Answers (1)

Related Questions