Souvik Ray
Souvik Ray

Reputation: 3028

Unable to extract values from a list using pandas

I have a data set which is something like below

data = [("patient 1", 0.44), ("patient 2", 0.14), ("patient 3", 0.22)]

So I need to create a list of first values of every tuple. So this is what I do

df = pd.DataFrame(np.array(data))
values = df.iloc[:, 0].unique() 

So I get an expected list which looks like below

['patient 1', 'patient 2', 'patient 3']

But sometimes my dataset might have some missing values. So it maybe something like this

data = [("patient 1", 0.44), ("patient 2", 0.14), ("patient 3",)]

As you can see, the value for patient 3 is empty or None. So when I run the above program again, instead of getting the list of first values of every tuple, I get the original list as it is

[('patient 1', 0.44), ('patient 2', 0.14), ('patient 3',)]

How do I ensure that despite the data being incomplete, I get the list I want since I only want the first values of each tuple?

Note: I know I can use simple python to extract first values but since the data set can be very big, I want to stick to Pandas to get the result.

Upvotes: 0

Views: 49

Answers (2)

vlemaistre
vlemaistre

Reputation: 3331

You could clean your data. Here is an example of how you could do it :

data = [("patient 1", 0.44), ("patient 2", 0.14), ("patient 3",)]

# We check if there are two values in the tuple otherwise we discard it
cleaned_data = [(x[0], x[1]) for x in data if len(x)>1]

df = pd.DataFrame(np.array(cleaned_data ))
values = df.iloc[:, 0].unique() 

Output :

array(['patient 1', 'patient 2'], dtype=object)

Upvotes: 1

René
René

Reputation: 4827

I would suggest:

pd.DataFrame(data).fillna('')[0].values

Hope this helps.

Upvotes: 0

Related Questions