Behzat
Behzat

Reputation: 95

how to remove a specific column from a dataframe

I have a dataframe and it has following attributes; id,text,created_at,retweet_count,favorite_count, source, user_id

I want to get a new dataframe by ejecting df.text lines which start with "RT".

non_retweeted_list = []

for i in range(len(df)):
    if (df.text[i][0] and df.text[i][1]) == ('R' and 'T'):
        pass
    else:
        non_retweeted_list.append(df[i])

But I get below KeyError:

KeyError                                  
Traceback (most recent call last)
/home/bd/anaconda3/lib/python3.5/site-packages/pandas/indexes
/base.py in get_loc(self, key, method, tolerance)
1944  try:
-> 1945                 return self._engine.get_loc(key)
1946             except KeyError:

.
.
.
During handling of the above exception, another exception occurred:
KeyError                               Traceback (most recent call last)
<ipython-input-3-5dfc6d77a22c> in <module>()
  5         pass
  6     else: 
 ----> 7         non_retweeted_list.append(df[i])
 .
 .
 .
 KeyError: 0

How can I fix it ?

Upvotes: 1

Views: 150

Answers (2)

Mohammad Athar
Mohammad Athar

Reputation: 1980

could be the way you're referencing your index. Also, that's an odd way to check the first two characters. Why are you doing it that way? What do you think about the way I'm showing below?

non_retweeted_list = []
for i in range(len(df)):
    if 'RT' == df['text'][df.index==i][0:2]:
        pass
    else:
        non_retweeted_list.append(df[df.index[i]])

Lastly, it's probably not a good idea to do an if-pass statement. Use the negative instead

non_retweeted_list = []
for i in range(len(df)):
    if 'RT' != df['text'][df.index==i][0:2]:
        non_retweeted_list.append(df[df.index==i])

Upvotes: 1

jezrael
jezrael

Reputation: 863351

You need boolean indexing with startswith for mask:

df = pd.DataFrame({'text':['RT apple','dog','RT baladiska']})
print (df)
           text
0      RT apple
1           dog
2  RT baladiska

mask = df['text'].str.startswith('RT')
print (mask)
0     True
1    False
2     True
Name: text, dtype: bool

#filter out columns start with RT
df1 = df[~mask]
print (df1)
  text
1  dog

#filter values starting RT
df2 = df[mask]
print (df2)
           text
0      RT apple
2  RT baladiska

Alternatively:

mask = df['text'].str.contains('^RT')

Upvotes: 2

Related Questions