justintime
justintime

Reputation: 101

Python Pandas - How to check a value in DataFrame

How do I find a missing line in the dataframe and add a new one?

The DataFrame df

    federalState    hasParking  Size
0   A               False       154
1   A               True        531
2   B               False       191
3   B               True        725
4   C               True        54
5   D               False       100
6   D               True        656

For df['federalState'] the false for C is missing

The final result should look like this

    federalState    hasParking  Size
0   A               False       154
1   A               True        531
2   B               False       191
3   B               True        725
4   C               False       89
5   C               True        54
6   D               False       100
7   D               True        656

My code for adding the new line

df.loc[-1] = ['C', 'False' , 89]  # adding a row
df.index = df.index + 1  # shifting index
df = too.sort_values(by=['federalState'])  # sorting by index

But how do I find out that the line is missing? My if-statement does not work

if ((df['federalState']=='C) and (df['hasParking']=='True')).any():

Upvotes: 1

Views: 78

Answers (2)

Tai
Tai

Reputation: 7984

IIUC, you want to search within each lable of "federalState" column that whether there are some missing values.

To find elements that do not have the same unique values, you can first do groupby and then check unique elements in the hasParking column with nunique().

df.groupby("federalState")["hasParking"].nunique()
federalState
A    2
B    2
C    1
D    2
Name: hasParking, dtype: int64

To check existence of a particular element in a group, you can try

df.groupby("federalState")["hasParking"].apply(lambda g: g.isin([False]).any())

federalState
A     True
B     True
C    False    # does not contain False
D     True
Name: hasParking, dtype: bool

Upvotes: 1

jezrael
jezrael

Reputation: 862406

For chain condition use & for and. If hasParking is boolean == True should be omit.

There is difference between True - as boolean and 'True' as string, I think you need remove '' because boolean column.

if ((data['federalState']=='C') & (data['hasParking'])).any():
#same as
#if ((data['federalState']=='C') & (data['hasParking'] == True)).any():

And for first is possible after sorting add reset_index for default index:

df.loc[-1] = ['C', False , 89]  # adding a row
df = df.sort_values(by=['federalState']).reset_index(drop=True)
print (df)
  federalState  hasParking  Size
0            A       False   154
1            A        True   531
2            B       False   191
3            B        True   725
4            C        True    54
5            C       False    89
6            D       False   100
7            D        True   656

print (df.dtypes)
federalState    object
hasParking        bool
Size             int64
dtype: object

For find missing values use:

df1 = df.set_index(['federalState','hasParking'])['Size'].unstack().unstack().reset_index(name='val')
print (df1)
   hasParking federalState    val
0       False            A  154.0
1       False            B  191.0
2       False            C    NaN
3       False            D  100.0
4        True            A  531.0
5        True            B  725.0
6        True            C   54.0
7        True            D  656.0

a = df1.loc[df1['val'].isnull(), ['federalState','hasParking']]
print (a)
  federalState  hasParking
2            C       False

Upvotes: 3

Related Questions