Reputation: 13660
I have a pandas dataframe and I want to make the last N columns null values. N is dependent on the value in another column.
Here is an example:
df = pd.DataFrame(np.random.randn(4, 5))
df['lookup_key'] = df.index #(actual data does not use index here)
lkup_dict = {0:1,1:2,2:2,3:3}
In this DataFrame, I want to use the value in the 'lookup_key' column to determine which columns to set to null.
Row 0 -> df.ix[0,lkup_dict[0]:4] = np.nan #key = 0, value = 1
Row 1 -> df.ix[1,lkup_dict[1]:4] = np.nan #key = 1, value = 2
Row 2 -> df.ix[2,lkup_dict[2]:4] = np.nan #key = 2, value = 2
Row 3 -> df.ix[3,lkup_dict[3]:4] = np.nan #key = 3, value = 3
The end result looking like this:
0 1 2 3 4 lookup_key
0 -0.882864 NaN NaN NaN NaN 0
1 1.358663 -0.024898 NaN NaN NaN 1
2 0.885058 0.673621 NaN NaN NaN 2
3 -1.487506 0.031021 -1.313646 NaN NaN 3
In this example I have to manually type out the df.ix... for each row. I need something that will do this for all rows of my DataFrame
Upvotes: 0
Views: 60
Reputation: 7188
You can do this with a for loop. To demonstrate, I generate a DataFrame with some random values. I then insert a lookup_key
column in the front with some random integers. Finally, I generate lkup_dict
dictionary with some random values.
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
>>> df.insert(0, 'lookup_key', np.random.randint(0, 5, 10))
>>> print df
lookup_key A B C D
0 0 0.048738 0.773304 -0.912366 -0.832459
1 3 -0.573221 -1.381395 -0.644223 1.888484
2 0 0.198043 -0.751243 0.138277 2.006188
3 2 -1.692605 -1.586282 -0.656690 0.647510
4 3 -0.847591 -0.368447 0.510250 -0.172055
5 1 0.927243 -0.447478 0.796221 0.372763
6 3 0.027285 0.177276 1.087456 -0.420614
7 4 -1.147004 -0.172367 -0.767347 -0.855318
8 1 -0.649695 -0.572409 -0.664149 0.863050
9 4 -0.820982 -0.499889 -0.624889 1.397271
>>> lkup_dict = {i: np.random.randint(0, 5) for i in range(5)}
>>> print lkup_dict
{0: 3, 1: 0, 2: 0, 3: 4, 4: 1}
Now I iterate over the rows in the DataFrame. key
gets the value under the lookup_key
column for that row. nNulls
uses the key to get the number of null values from lkup_dict
. startIndex
gets the index for the first column with a null value in that row. The final line replaces the relevant values with null values.
>>> for i, row in df.iterrows():
... key = row['lookup_key'].astype(int)
... nNulls = lkup_dict[key]
... startIndex = df.shape[1] - nNulls
... df.loc[i, startIndex:] = np.nan
>>> print df
lookup_key A B C D
0 0 0.048738 NaN NaN NaN
1 3 NaN NaN NaN NaN
2 0 0.198043 NaN NaN NaN
3 2 -1.692605 -1.586282 -0.656690 0.647510
4 3 NaN NaN NaN NaN
5 1 0.927243 -0.447478 0.796221 0.372763
6 3 NaN NaN NaN NaN
7 4 -1.147004 -0.172367 -0.767347 NaN
8 1 -0.649695 -0.572409 -0.664149 0.863050
9 4 -0.820982 -0.499889 -0.624889 NaN
That's it. Hopefully that's what you're looking for.
Upvotes: 2