Neha Sharma
Neha Sharma

Reputation: 21

R data cleaning

I have a dataframe (df1) scrapped as a single column of data .

1
2  Amazon Pantry
3  Best Sellerin Soaps & Hand Wash
4
5  Palmolive Hygiene-Plus Sensitive Liquid Hand Wash, 300ml
6  Palmolive Hygiene-Plus Sensitive Liquid Hand Wash, 300ml
7   £0.90
8    ?
9
10  Palmolive Naturals Nourishing Liquid Hand Wash, 300ml
11  Palmolive Naturals Nourishing Liquid Hand Wash, 300ml
12  £0.90
13  ?
14
15  L'Oreal Men Expert Carbon Protect Deodorant 250ml
16  L'Oreal Men Expert Carbon Protect Deodorant 250ml
17  £1.50

In order to clean the data i tried using the below commands such that to get Product and pricing information in 2 separate columns . Can someone let me know if there is an alternate way of doing it .

install.packages("splitstackshape")
newdf <- cSplit(df1, "Amazon_Normal_Text2", direction = "long")

Upvotes: 0

Views: 97

Answers (1)

sweetmusicality
sweetmusicality

Reputation: 937

this is just a thought process...

  1. everytime there's a "ml," extract information until ml going backward until there is a space and store that into volume variable. (substr)
  2. extract information from £ to the end of the string and store that into price variable. (grep, regex, nchar)
  3. extract from beginning of string until the character location for volume occurrence into product variable (substr, nchar)

look into substr, nchar, grep, regex

Upvotes: 0

Related Questions