Reputation: 95
I have a dataframe and it has following attributes; id,text,created_at,retweet_count,favorite_count, source, user_id
I want to get a new dataframe by ejecting df.text lines which start with "RT".
non_retweeted_list = []
for i in range(len(df)):
if (df.text[i][0] and df.text[i][1]) == ('R' and 'T'):
pass
else:
non_retweeted_list.append(df[i])
But I get below KeyError:
KeyError
Traceback (most recent call last)
/home/bd/anaconda3/lib/python3.5/site-packages/pandas/indexes
/base.py in get_loc(self, key, method, tolerance)
1944 try:
-> 1945 return self._engine.get_loc(key)
1946 except KeyError:
.
.
.
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-3-5dfc6d77a22c> in <module>()
5 pass
6 else:
----> 7 non_retweeted_list.append(df[i])
.
.
.
KeyError: 0
How can I fix it ?
Upvotes: 1
Views: 150
Reputation: 1980
could be the way you're referencing your index. Also, that's an odd way to check the first two characters. Why are you doing it that way? What do you think about the way I'm showing below?
non_retweeted_list = []
for i in range(len(df)):
if 'RT' == df['text'][df.index==i][0:2]:
pass
else:
non_retweeted_list.append(df[df.index[i]])
Lastly, it's probably not a good idea to do an if-pass
statement. Use the negative instead
non_retweeted_list = []
for i in range(len(df)):
if 'RT' != df['text'][df.index==i][0:2]:
non_retweeted_list.append(df[df.index==i])
Upvotes: 1
Reputation: 863351
You need boolean indexing
with startswith
for mask:
df = pd.DataFrame({'text':['RT apple','dog','RT baladiska']})
print (df)
text
0 RT apple
1 dog
2 RT baladiska
mask = df['text'].str.startswith('RT')
print (mask)
0 True
1 False
2 True
Name: text, dtype: bool
#filter out columns start with RT
df1 = df[~mask]
print (df1)
text
1 dog
#filter values starting RT
df2 = df[mask]
print (df2)
text
0 RT apple
2 RT baladiska
Alternatively:
mask = df['text'].str.contains('^RT')
Upvotes: 2