Reputation: 122148
I have a pandas
dataframe matrix that looks like this:
Store Sales year month day
0 1 5263 2015 7 31
1 1 5020 2015 7 30
2 1 4782 2015 7 29
3 2 5011 2015 8 28
4 2 6102 2015 9 27
[986159 rows x 5 columns]
I need to split the data up into by removing the month where the value is 8 and 9. Then I need the rest of the training data to be in the other set.
I could do it like this but it doesn't work:
# Dataframe with 8 and 9 months
train_X1 = train[train['month'] == 9 or train['month'] == 8]
# The rest of the data
train_X2 = train[train['month'] != 9 or train['month'] != 8]
I could do this but that only gets me one part of the data with 8 and 9 month but the rest isn't captured:
train8 = train[train['month'] == 8]
train9 = train[train['month'] == 9]
train89 = train8 + train9
How do I split dataframe
the into 2 parts where one of it has specific values without splitting it twice? (maybe with dataframe.query()
or pandas.train_test_split()
?)
Upvotes: 2
Views: 274
Reputation: 21441
The syntax of the operation is incorrect, replace the above split with the following. You also need to wrap each predicate in parens and use '|' (or) and '&' (and). This will perform the appropriate splits.
train_X1 = train[(train['month'] == 9) | (train['month'] == 8)]
train_X2 = train[(train['month'] != 9) & (train['month'] != 8)]
Upvotes: 1