Reputation: 439
I am trying to grab all the values of one column based on the value of another. I found some helpful stackoverflow questions already that are related to mine, but the solution in those don't seem to work on a variable range. Do I need to do something different for a variable?
I am trying to only grab the values of column 'open', from the dataset where the value of 'month' equals the month variable in the loop.
To be clear, the expected output is only the 'open' values.
for year in dfClose['year'].unique():
tempYearDF = dfClose[dfClose['year'] == year]
for month in range(1,13):
tempOpenDF = tempYearDF.loc[tempYearDF['month'] == month, 'open']
I plan to do more manipulating to the tempOpenDF variable after assigning the data, but I first need to verify it is populating.
Sample data
dfClose
open year month day date
0 30.490000 2010 1 4 2010-01-04
1 30.657143 2010 1 5 2010-01-05
2 30.625713 2010 1 6 2010-01-06
3 30.250000 2010 1 7 2010-01-07
4 30.042856 2010 1 8 2010-01-08
.
.
2551 297.260010 2020 2 24 2020-02-24
2552 300.950012 2020 2 25 2020-02-25
2553 286.529999 2020 2 26 2020-02-26
2554 281.100006 2020 2 27 2020-02-27
2555 257.260010 2020 2 28 2020-02-28
Output
tempOpenDF
Series([], Name: open, dtype: float64)
Data types
tempYearDF.dtypes
open float64
year int64
month int64
day int64
date object
dtype: object
All the data for "year" is correctly separating, just having trouble grabbing month data now.
tempYearDF
open year month day date
2516 296.239990 2020 1 2 2020-01-02
2517 297.149994 2020 1 3 2020-01-03
2518 293.790009 2020 1 6 2020-01-06
2519 299.839996 2020 1 7 2020-01-07
2520 297.160004 2020 1 8 2020-01-08
2521 307.239990 2020 1 9 2020-01-09
2522 310.600006 2020 1 10 2020-01-10
2523 311.640015 2020 1 13 2020-01-13
2524 316.700012 2020 1 14 2020-01-14
2525 311.850006 2020 1 15 2020-01-15
2526 313.589996 2020 1 16 2020-01-16
2527 316.269989 2020 1 17 2020-01-17
2528 317.190002 2020 1 21 2020-01-21
2529 318.579987 2020 1 22 2020-01-22
2530 317.920013 2020 1 23 2020-01-23
2531 320.250000 2020 1 24 2020-01-24
2532 310.059998 2020 1 27 2020-01-27
2533 312.600006 2020 1 28 2020-01-28
2534 324.450012 2020 1 29 2020-01-29
2535 320.540009 2020 1 30 2020-01-30
2536 320.929993 2020 1 31 2020-01-31
2537 304.299988 2020 2 3 2020-02-03
2538 315.309998 2020 2 4 2020-02-04
2539 323.519989 2020 2 5 2020-02-05
2540 322.570007 2020 2 6 2020-02-06
2541 322.369995 2020 2 7 2020-02-07
2542 314.179993 2020 2 10 2020-02-10
2543 323.600006 2020 2 11 2020-02-11
2544 321.470001 2020 2 12 2020-02-12
2545 324.190002 2020 2 13 2020-02-13
2546 324.739990 2020 2 14 2020-02-14
2547 315.359985 2020 2 18 2020-02-18
2548 320.000000 2020 2 19 2020-02-19
2549 322.630005 2020 2 20 2020-02-20
2550 318.619995 2020 2 21 2020-02-21
2551 297.260010 2020 2 24 2020-02-24
2552 300.950012 2020 2 25 2020-02-25
2553 286.529999 2020 2 26 2020-02-26
2554 281.100006 2020 2 27 2020-02-27
2555 257.260010 2020 2 28 2020-02-28
If I use an actual value for the equals too, I get the results I want. But when I try use that value based on the range loop value, it breaks.
Good
tempYearDF.loc[tempYearDF['month'] == 1, 'open']
2516 296.239990
2517 297.149994
2518 293.790009
2519 299.839996
2520 297.160004
2521 307.239990
2522 310.600006
2523 311.640015
Upvotes: 3
Views: 4669
Reputation: 2048
for month in range(1,13):
tempOpenDF = tempYearDF.loc[tempYearDF['month'] == month, 'open']
loc
= location, or "named" items
You might want iloc
, but also, isn't tempYearDF['month']
an entire column?
You might want to refer to tempYearDF['month'].value
or tempYearDF['month'].the_name_of_this_column
(or whatever the appropriate method/attribute is).
df[df["month"] ==1]
is a slice with 21 rows and all the columns
df.loc[df["month"] ==1]
is also a slice with 21 rows and all the columns
'df.loc[df["month"] ==1, "open"
does return the 21 rows just in the open
column when month equals 1.
Where are you saving this too? tempOpenDF
is within the for
loop. Its value will just change with every index of the loop.
I would have to see more of where this is passed to. As it stands you filter correctly, but sending that filtered data to nowhere.
What you have works otherwise.
import pandas as pd
df = pd.read_csv("sample_data.csv",sep='\t',parse_dates=["date"])
# sample data is what you provided above, using tab separation
#
some_year = 2020
print(df.loc[df["month"] == 1, 'open'],'\n')
print(df.loc[df["year"] == 2020, 'open'],'\n')
# print(df.loc[(df["month"] == 1 and df["year"] == 2020), 'open'])
for i in range(1,13):
dfy = df.loc[df["year"] == 2020]
mondata = dfy.loc[dfy["month"] == i, "open"]
print("Month: ",i,'\n',mondata,"\n")
>>> df.head()
some_index open year month day date
0 2516 296.239990 2020 1 2 2020-01-02
1 2517 297.149994 2020 1 3 2020-01-03
2 2518 293.790009 2020 1 6 2020-01-06
3 2519 299.839996 2020 1 7 2020-01-07
4 2520 297.160004 2020 1 8 2020-01-08
true index is 0,1, etc. some_index
came from your data.
Upvotes: 0
Reputation: 320
Sample data frame:
0 1 2
0 123 2020 1
1 234 2020 2
2 543 2020 1
# For all unique years
for y in df[1].unique():
# For all unique months
for m in df[2].unique():
# Get the row based on the month
row = df.loc[df[2] == m]
# Print only the desired column
print(row[0])
Output:
0 123
2 543
Name: 0, dtype: int64
1 234
Name: 0, dtype: int64
Upvotes: 0