Reputation: 83
Novice programmer here seeking help. I have a Dataframe that looks like this:
Message
0 "Blah blah $AAPL"
1 "Blah blah $ABT"
2 "Blah blah $amzn"
3 "Blah blah $AMZN"
4 "Blah blah $KO"
5 "Blah blah $fb"
6 "Blah blah $GOOGL"
7 "Blah blah $BA"
8 "Blah blah $BMY"
My desired output is a new column that gives me the Cashtag used in the tweet, regardless if it is uppercase or lowercase. In this example it would be:
Message Cashtag
0 "Blah blah $AAPL" "$AAPL"
1 "Blah blah $ABT" "$ABT"
2 "Blah blah $amzn" "$AMZN"
3 "Blah blah $AMZN" "$AMZN"
4 "Blah blah $KO" "$KO"
5 "Blah blah $fb" "$FB"
6 "Blah blah $GOOGL" "$GOOGL"
7 "Blah blah $ba" "$BA"
8 "Blah blah $BMY" "$BMY"
How can I achieve my desired output?
Upvotes: 2
Views: 60
Reputation: 23099
IIUC,
df['Cashtag'] = df['Message'].str.upper().str.extract('(\$\w+)')
print(df)
Message Cashtag
0 0 "Blah blah $AAPL" $AAPL
1 1 "Blah blah $ABT" $ABT
2 2 "Blah blah $amzn" $AMZN
3 3 "Blah blah $AMZN" $AMZN
4 4 "Blah blah $KO" $KO
5 5 "Blah blah $fb" $FB
6 6 "Blah blah $GOOGL" $GOOGL
7 7 "Blah blah $BA" $BA
8 8 "Blah blah $BMY" $BMY
Upvotes: 1
Reputation: 662
This will pull the first cashtag out of any string:
df['Cashtag'] = df['Message'].str.extract(r'(\$[A-Za-z]{1,4})', expand=False)
Check out the docs for Series.str.extract.
Better yet, so you can group by cashtags later, I’d recommend also converting them to all upper case:
df['Cashtag'] = df['Message'].str.extract(r'(\$[A-Za-z]{1,4})', expand=False).str.upper()
Upvotes: 2