Ree
Ree

Reputation: 59

how to convert and store text file to csv

13-JUL-17                                                                       


Bank User                      Space Occupied(GB)                               
------------------------------ ------------------                               
CKYC_MNSB                              .004211426                               
CORE_AMARNATH_ASP                      8.75262451                               
CORE_AMBUJA                            6.80389404                               
CORE_AMBUJA_ASP                        10.0085449                               
CORE_ANAND_MERC_ASP                    18.9866333                               
CORE_BALOTRA                           17.8280029                               
CORE_BASODA                            4.55432129                               
CORE_CHHAPI_ASP                        11.9767456                               
CORE_DHANGDHRA_ASP                     13.1849976                               
CORE_IDAR_ASP                          13.3209229                               
CORE_JANTA_HALOL_ASP                   12.7955933                               

Bank User                      Space Occupied(GB)                               
------------------------------ ------------------                               
CORE_JHALOD_URBAN_ASP                  9.19219971                               
CORE_MANINAGAR                         5.36090088                               
CORE_MANINAGAR_ASP                     6.31414795                               
CORE_SANKHEDA                          20.4329834                               
CORE_SMCB_ANAND_ASP                    11.3191528                               
CORE_TARAPUR_ASP                       8.24627686                               
CORE_VUCB                              .000610352                               
TBA_TEMP                               5.39910889                               
TEST_DUNIA                             4.15698242                               

20 rows selected.


TABLESPACE NAME                Free Space in GB                                 
------------------------------ ----------------                                 
TBAPROJ                              33.2736816                                 

I have above text file.

How to store in CSV file with column separated?

I have load file but its very difficult to remove blank space from the file.

Upvotes: 1

Views: 84

Answers (2)

Spacedman
Spacedman

Reputation: 94212

Each line you want matches the pattern of a word made from capital letters and underscores, then spaces, then a number that has a decimal point in it. so this grep will filter those out:

> file_raw <- readLines('file.txt') 
> read.table(
    text=paste(
      file_raw[
        grep("^[A-Z_].*\\s*\\.",file_raw)
        ],
        collapse="\n"),
     sep="",head=FALSE)
                      V1           V2
1              CKYC_MNSB  0.004211426
2      CORE_AMARNATH_ASP  8.752624510
3            CORE_AMBUJA  6.803894040
4        CORE_AMBUJA_ASP 10.008544900
5    CORE_ANAND_MERC_ASP 18.986633300
6           CORE_BALOTRA 17.828002900
7            CORE_BASODA  4.554321290
8        CORE_CHHAPI_ASP 11.976745600
9     CORE_DHANGDHRA_ASP 13.184997600
10         CORE_IDAR_ASP 13.320922900
11  CORE_JANTA_HALOL_ASP 12.795593300
12 CORE_JHALOD_URBAN_ASP  9.192199710
13        CORE_MANINAGAR  5.360900880
14    CORE_MANINAGAR_ASP  6.314147950
15         CORE_SANKHEDA 20.432983400
16   CORE_SMCB_ANAND_ASP 11.319152800
17      CORE_TARAPUR_ASP  8.246276860
18             CORE_VUCB  0.000610352
19              TBA_TEMP  5.399108890
20            TEST_DUNIA  4.156982420
21               TBAPROJ 33.273681600

Note that if you are expecting any of the first tokens to not match the pattern, for example CORE_999 or lower_case then you need to adjust the pattern. But without a formal spec we can only go on what you supplied.

Upvotes: 1

Val
Val

Reputation: 7023

There might be possibly a more elegant way, but this does the trick:

# read raw file in lines
file_raw <- readLines('file.txt') 

# remove whitespace
file_trim <- trimws(file_raw,which = 'both')

# remove empty lines
file_trim <- file_trim[file_trim != '']

# sub white space with separator ,
file_csv <- gsub('\\s{2,}',',',file_trim)

In the end there will be still some things left like the -- separators and 20 rows selected., but that can be filtered out easily if you want, before writing or after reading it:

file_clean <- file_csv[!grepl('(-){3,}|rows selected',file_csv)]

write.csv(file_clean,'file_cleaned.csv')




      > read.csv('file_cleaned.csv')
    X                                x
1   1                        13-JUL-17
2   2     Bank User,Space Occupied(GB)
3   3             CKYC_MNSB,.004211426
4   4     CORE_AMARNATH_ASP,8.75262451
5   5           CORE_AMBUJA,6.80389404
6   6       CORE_AMBUJA_ASP,10.0085449
7   7   CORE_ANAND_MERC_ASP,18.9866333
8   8          CORE_BALOTRA,17.8280029
9   9           CORE_BASODA,4.55432129
10 10       CORE_CHHAPI_ASP,11.9767456
11 11    CORE_DHANGDHRA_ASP,13.1849976
12 12         CORE_IDAR_ASP,13.3209229
13 13  CORE_JANTA_HALOL_ASP,12.7955933
14 14     Bank User,Space Occupied(GB)
15 15 CORE_JHALOD_URBAN_ASP,9.19219971
16 16        CORE_MANINAGAR,5.36090088
17 17    CORE_MANINAGAR_ASP,6.31414795
18 18         CORE_SANKHEDA,20.4329834
19 19   CORE_SMCB_ANAND_ASP,11.3191528
20 20      CORE_TARAPUR_ASP,8.24627686
21 21             CORE_VUCB,.000610352
22 22              TBA_TEMP,5.39910889
23 23            TEST_DUNIA,4.15698242
24 24 TABLESPACE NAME,Free Space in GB
25 25               TBAPROJ,33.2736816

Upvotes: 1

Related Questions