vincent
vincent

Reputation: 1678

remove specific characters in dataframe and csv file

l have a csv file that l treat using pandas dataframe. in the column which called left l'm supposed to have only numbers 1)

  df.icol(4) 
    0       2492
    1       2448
    2       2410
    3       2382
    4       2358
    5       2310
    6       2260
    7       2208
    8       2166
    9       2134
    10       198
    11       198
    12       239
    13       239
    14       243
    15       241
    16       239
    17       394
    18       396
    19       396
    20       396
    21       396
    22       396
    23       396
    24       396
Name: bottom, dtype: object

however going further in my csv file l noticed that l have something like 396] or [456. My question is how l remove all the [ and ] in this column. 2) in the another column

df1.icol(0)
0       'm'
1       'i'
2       'i'
3       'l'
4       'm'
5       'u'
6       'i'
7       'l'
8       'i'
9       'l'
10      '.'
11      '3'
12      'A'
13      'M'
14      'S'
15      'U'
16      'N'
17      'A'
18      'D'
19      'R'
20      'E'
21      'S'
22      'S'
23      'E'
Name: char, dtype: object

l noticed also that l have some rows with ['E' , ]'S' rather than 'E' and 'S'. how can l remove [ and ] ?

3) l have a dataframe

df =[['c', 88, 118, 2872, 2902], [] ,['g', 8, 98, 287, 202]]

l want to remove all the '[]' as a result l'm looking for something like the following :

df= [['c', 88, 118, 2872, 2902], ['g', 8, 98, 287, 202]]

Upvotes: 3

Views: 5248

Answers (1)

jezrael
jezrael

Reputation: 862911

I think you can use replace to empty string if need replace values in all columns:

df = df.replace(['\[','\]'], ['',''], regex=True)

Sample:

df = pd.DataFrame({'char':['[E','S]','[E']})
print (df)
  char
0   [E
1   S]
2   [E

df = df.replace(['\[','\]'], ['',''], regex=True)
print (df)
  char
0    E
1    S
2    E

If need replace only in one column:

df.char = df.char.replace(['\[','\]'], ['',''], regex=True)
print (df)
  char
0    E
1    S
2    E

For remove empty lists use list comprehension:

L = [['c', 88, 118, 2872, 2902], [] ,['g', 8, 98, 287, 202]]

L1 = [x for x in L if len(x) !=0]
print (L1)
[['c', 88, 118, 2872, 2902], ['g', 8, 98, 287, 202]]

And for remove NaN rows dropna:

df = pd.DataFrame([['c', 88, 118, 2872, 2902], [] ,['g', 8, 98, 287, 202]])
print (df)
      0     1      2       3       4
0     c  88.0  118.0  2872.0  2902.0
1  None   NaN    NaN     NaN     NaN
2     g   8.0   98.0   287.0   202.0

print (df.dropna(how='all'))
   0     1      2       3       4
0  c  88.0  118.0  2872.0  2902.0
2  g   8.0   98.0   287.0   202.0

Upvotes: 7

Related Questions