Reputation: 41
How to select rows from a DataFrame based on string values in a column in pandas? I just want to display the just States only which are in all CAPS. The states have the total number of cities.
import pandas as pd
import matplotlib.pyplot as plt
%pylab inline
d = pd.read_csv("states.csv")
print(d)
print(df)
# States/cities B C D
# 0 FL 3 5 6
# 1 Orlando 1 2 3
# 2 Miami 1 1 3
# 3 Jacksonville 1 2 0
# 4 CA 8 3 2
# 5 San diego 3 1 0
# 6 San Francisco 5 2 2
# 7 WA 4 2 1
# 8 Seattle 3 1 0
# 9 Tacoma 1 1 1
How to display like so,
# States/Cites B C D
# 0 FL 3 5 6
# 4 CA 8 3 2
# 7 WA 4 2 1
Upvotes: 2
Views: 8017
Reputation: 38415
You can use str.contains to filter any row that contains small alphabets
df[~df['States/cities'].str.contains('[a-z]')]
States/cities B C D
0 FL 3 5 6
4 CA 8 3 2
7 WA 4 2 1
Upvotes: 1
Reputation: 323236
If we assuming the order is always State followed by the city from the state , we can using where
and dropna
df['States/cities']=df['States/cities'].where(df['States/cities'].isin(['FL','CA','WA']))
df.dropna()
df
States/cities B C D
0 FL 3 5 6
4 CA 8 3 2
7 WA 4 2 1
Or we do str.len
df[df['States/cities'].str.len()==2]
Out[39]:
States/cities B C D
0 FL 3 5 6
4 CA 8 3 2
7 WA 4 2 1
Upvotes: 0
Reputation: 51335
You can get the rows with all uppercase values in the column States/cities
like this:
df.loc[df['States/cities'].str.isupper()]
States/cities B C D
0 FL 3 5 6
4 CA 8 3 2
7 WA 4 2 1
Just to be safe, you can add a condition so that it only returns the rows where 'States/cities'
is uppercase and only 2 characters long (in case you had a value that was SEATTLE
or something like that):
df.loc[(df['States/cities'].str.isupper()) & (df['States/cities'].apply(len) == 2)]
Upvotes: 1
Reputation: 107642
Consider pandas.Series.str.match passing a regex for only [A-Z]
states[states['States/cities'].str.match('^.*[A-Z]$')]
# States/cities B C D
# 0 FL 3 5 6
# 4 CA 8 3 2
# 7 WA 4 2 1
Data
from io import StringIO
import pandas as pd
txt = '''"States/cities" B C D
0 FL 3 5 6
1 Orlando 1 2 3
2 Miami 1 1 3
3 Jacksonville 1 2 0
4 CA 8 3 2
5 "San diego" 3 1 0
6 "San Francisco" 5 2 2
7 WA 4 2 1
8 Seattle 3 1 0
9 Tacoma 1 1 1'''
states = pd.read_table(StringIO(txt), sep="\s+")
Upvotes: 1
Reputation: 1295
You can write a function to be applied to each value in the States/cities
column. Have the function return either True or False, and the result of applying the function can act as a Boolean filter on your DataFrame.
This is a common pattern when working with pandas. In your particular case, you could check for each value in States/cities
whether it's made of only uppercase letters.
So for example:
def is_state_abbrev(string):
return string.isupper()
filter = d['States/cities'].apply(is_state_abbrev)
filtered_df = d[filter]
Here filter
will be a pandas Series with True
and False
values.
You can also achieve the same result by using a lambda expression, as in:
filtered_df = d[d['States/cities'].apply(lambda x: x.isupper())]
This does essentially the same thing.
Upvotes: 1