Maheswarha Rajagopal
Maheswarha Rajagopal

Reputation: 51

How to skip the first whitespace match in regex [Python]?

I'm using pandas 'read_csv' function to read the lines of a file which is not in a CSV format. It does not contain ',' (comma) for me to use it as the delimiter. It has whitespaces with different spacings. The line below is one of the example:

Power Output 12(25%)   24(50%)  12(25%)

I would like to extract them out using the following way pandas.read_csv(sep='') by using regex and I'm not sure how exactly it can be done. I know I can separate them using whitespaces but that will separate Power Output as two different columns. I want a regex method where I can match all the whitespaces irrelevant of the spacing, BUT skips the first match it founds.

I'm expecting the following output in the pandas dataframe later:

Col 1 Col 2 Col 3 Col 4
Power Output 12(25%) 24(50%) 12(25%)

Upvotes: 0

Views: 270

Answers (2)

mozway
mozway

Reputation: 260790

You can use white spaces followed by a digit as separator. For this use a look-ahead regex:

df = pd.read_csv(..., sep='\s+(?=\d)', engine='python')

Output:

              0        1        2        3
0  Power Output  12(25%)  24(50%)  12(25%)

Alternative regex, split by any group of spaces that is not followed by a non-digit: '\s+(?!\D)'

Upvotes: 3

vasia
vasia

Reputation: 1172

Your code uses sep='' (empty string). You want to use sep='\s+' (regex for whitespace).

If you want more detail, refer to the documentation for read_csv: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Upvotes: 0

Related Questions