Reputation: 319
I have a DataFrame of users and their ratings for movies:
userId movie1 movie2 movie3 movie4 movie5 movie6
0 4.1 NaN 1.0 NaN 2.1 NaN
1 3.1 1.1 3.4 1.4 NaN NaN
2 2.8 NaN 1.7 NaN 3.0 NaN
3 NaN 5.0 NaN 2.3 NaN 2.1
4 NaN NaN NaN NaN NaN NaN
5 2.3 NaN 2.0 4.0 NaN NaN
There isnt actually a userId column in the dataframe, it's just being used for the index
From this DataFrame, I'm trying to make a another DataFrame that only contain movies that have been rated by a specific user. For example if I wanted to make a new DataFrame of movies only rated by user with userId == 0
the output would a dataframe with:
userId movie1 movie3 movie5
0 4.1 1.0 2.1
1 3.1 3.4 NaN
2 2.8 1.7 3.0
3 NaN NaN NaN
4 NaN NaN NaN
5 2.3 2.0 NaN
I know how to iterate over the columns but I dont know how to select the columns I want by checking a row value.
Upvotes: 1
Views: 503
Reputation: 13841
Declare and loc
the userId of interest into a new df
keeping only the relevant columns.
Then pd.concat
the new df
with the other userId's and keep columns (movies) of your userId that you selected:
user = 0 # set your userId
a = df.loc[[user]].dropna(axis=1)
b = pd.concat([a, (df.drop(a.index))[[i for i in a.columns]]])
Which prints:
b
movie1 movie3 movie5
userId
0 4.10 1.00 2.10
1 3.10 3.40 NaN
2 2.80 1.70 3.00
3 NaN NaN NaN
4 NaN NaN NaN
5 2.30 2.00 NaN
Note that I have set the index
to be userId
as you specified.
Upvotes: 0
Reputation: 71707
You can use .loc
accessor to select the particular userId
then use notna
to create a boolean mask which specifies the columns which does not contain NaN
values, finally use this boolean mask to filter the columns:
userId = 0 # specify the userid here
df_user = df.loc[:, df.loc[userId].notna()]
Details:
>>> df.loc[userId].notna()
movie1 True
movie2 False
movie3 True
movie4 False
movie5 True
movie6 False
Name: 0, dtype: bool
>>> df.loc[:, df.loc[userId].notna()]
movie1 movie3 movie5
userId
0 4.1 1.0 2.1
1 3.1 3.4 NaN
2 2.8 1.7 3.0
3 NaN NaN NaN
4 NaN NaN NaN
5 2.3 2.0 NaN
Upvotes: 2
Reputation: 296
Another approach:
import pandas as pd
user0 = df.iloc[0,:] #select the first row
flags = user0.notna() #flag the non NaN values
flags = flags.tolist() #convert to list instead of series
newdf = df.iloc[:,flags] #return all rows, and the columns where flags are true
Upvotes: 0