Reputation: 3028
I have a data set which is something like below
data = [("patient 1", 0.44), ("patient 2", 0.14), ("patient 3", 0.22)]
So I need to create a list of first values of every tuple. So this is what I do
df = pd.DataFrame(np.array(data))
values = df.iloc[:, 0].unique()
So I get an expected list which looks like below
['patient 1', 'patient 2', 'patient 3']
But sometimes my dataset might have some missing values. So it maybe something like this
data = [("patient 1", 0.44), ("patient 2", 0.14), ("patient 3",)]
As you can see, the value for patient 3
is empty or None. So when I run the above program again, instead of getting the list of first values of every tuple, I get the original list as it is
[('patient 1', 0.44), ('patient 2', 0.14), ('patient 3',)]
How do I ensure that despite the data being incomplete, I get the list I want since I only want the first values of each tuple?
Note: I know I can use simple python to extract first values but since the data set can be very big, I want to stick to Pandas
to get the result.
Upvotes: 0
Views: 49
Reputation: 3331
You could clean your data. Here is an example of how you could do it :
data = [("patient 1", 0.44), ("patient 2", 0.14), ("patient 3",)]
# We check if there are two values in the tuple otherwise we discard it
cleaned_data = [(x[0], x[1]) for x in data if len(x)>1]
df = pd.DataFrame(np.array(cleaned_data ))
values = df.iloc[:, 0].unique()
Output :
array(['patient 1', 'patient 2'], dtype=object)
Upvotes: 1
Reputation: 4827
I would suggest:
pd.DataFrame(data).fillna('')[0].values
Hope this helps.
Upvotes: 0