Reputation: 101
How do I find a missing line in the dataframe and add a new one?
The DataFrame df
federalState hasParking Size
0 A False 154
1 A True 531
2 B False 191
3 B True 725
4 C True 54
5 D False 100
6 D True 656
For df['federalState']
the false for C
is missing
The final result should look like this
federalState hasParking Size
0 A False 154
1 A True 531
2 B False 191
3 B True 725
4 C False 89
5 C True 54
6 D False 100
7 D True 656
My code for adding the new line
df.loc[-1] = ['C', 'False' , 89] # adding a row
df.index = df.index + 1 # shifting index
df = too.sort_values(by=['federalState']) # sorting by index
But how do I find out that the line is missing? My if
-statement does not work
if ((df['federalState']=='C) and (df['hasParking']=='True')).any():
Upvotes: 1
Views: 78
Reputation: 7984
IIUC, you want to search within each lable of "federalState"
column that whether there are some missing values.
To find elements that do not have the same unique values, you can first do groupby
and then check unique elements in the hasParking
column with nunique()
.
df.groupby("federalState")["hasParking"].nunique()
federalState
A 2
B 2
C 1
D 2
Name: hasParking, dtype: int64
To check existence of a particular element in a group, you can try
df.groupby("federalState")["hasParking"].apply(lambda g: g.isin([False]).any())
federalState
A True
B True
C False # does not contain False
D True
Name: hasParking, dtype: bool
Upvotes: 1
Reputation: 862406
For chain condition use &
for and
. If hasParking
is boolean == True
should be omit.
There is difference between True
- as boolean
and 'True'
as string
, I think you need remove ''
because boolean column.
if ((data['federalState']=='C') & (data['hasParking'])).any():
#same as
#if ((data['federalState']=='C') & (data['hasParking'] == True)).any():
And for first is possible after sorting add reset_index
for default index
:
df.loc[-1] = ['C', False , 89] # adding a row
df = df.sort_values(by=['federalState']).reset_index(drop=True)
print (df)
federalState hasParking Size
0 A False 154
1 A True 531
2 B False 191
3 B True 725
4 C True 54
5 C False 89
6 D False 100
7 D True 656
print (df.dtypes)
federalState object
hasParking bool
Size int64
dtype: object
For find missing values use:
df1 = df.set_index(['federalState','hasParking'])['Size'].unstack().unstack().reset_index(name='val')
print (df1)
hasParking federalState val
0 False A 154.0
1 False B 191.0
2 False C NaN
3 False D 100.0
4 True A 531.0
5 True B 725.0
6 True C 54.0
7 True D 656.0
a = df1.loc[df1['val'].isnull(), ['federalState','hasParking']]
print (a)
federalState hasParking
2 C False
Upvotes: 3