Reputation: 21
I am trying to find a way to read just one value from a big dataframe in Python. I have 2 data tables in my project.
One looks like this:
Company ID Company 201512 201511 ... 199402 199401
1234 abc 1.1 0.8 ... 2.1 -0.9
.
.
.
4321 cba 2.1 -0.4 ... 0.3 -0.1
There are about 260 months and 10,000 companies. I need to check their monthly returns one by one and see if there are 36 valid data points behind that data point. That means there is no "0" or "NaN". If there are 36 valid data points, I need to run a regression of these 36 data points against 7 factors, which are listed in another table.
The other table looks like this:
Month Factor1 Factor2 ... Factor6 Factor7
201512 -0.4 1.1 ... 2.1 1.2
.
.
.
199401 0.1 0.2 ... 0.3 0.4
Now my problem is, I couldn't find a way to load just one value at a time from table 1 and create a loop for it. Can someone please advise?
Upvotes: 0
Views: 667
Reputation: 9946
you don't want a for loop for this.
assuming 0
is a valid monthly return and that you only have 36 columns after Company
you can easily find all companies with valid monthly return data:
df = df[df.notnull().all(1)]
if, for some unknown reason, you want to get rid of 0
s, you can do a replace first:
df = df[df.replace(0, np.nan).notnull().all(1)]
edit for the comment:
you could do something like:
cols = df.columns
first_col = get_first_return_col(df)
for i in range(first_col, len(cols)):
df = df[df[cols[i : i + 36]].notnull().all(1)]
run_regression(df[cols[i]])
Upvotes: 0
Reputation:
You can iterate over rows with following code:
for index, row in df.iterrows():
Then the index
would be the index of the row, and you can access the columns with lets say row["Company"]
for example.
Upvotes: 1