当数据量较大时,我们会以文本形式进行存储,每条数据间用固定符号分割。当需要分析这些数据时,数据量小我们还可以拷贝进excel中,在excel中做分列和分行操作,数据量过大时,贴入excel会超过excel的最大列,无法在excel中手动操作。
python可以十分方便的解决此类重复工作,以下是我的解决过程,写出以供大家参考,此操作需要python环境。
1、txt文本中的数据样例如下(实际处理的文件有39.4M):
2019-11-01 05:30:00 [{"VOCs":0.0,"x.y":"31.104637,120.869278","NOx":0.0,"latitude":31.104637,"CO":0.0,"PM":0.0,"longitude":120.869278},{"VOCs":0.0,"x.y":"31.109129,120.869278","NOx":0.0,"latitude":31.109129,"CO":0.0,"PM":0.0,"longitude":120.869278}]
2019-11-01 05:00:00 [{"VOCs":0.0,"x.y":"31.104637,120.869278","NOx":0.0,"latitude":31.104637,"CO":0.0,"PM":0.0,"longitude":120.869278},{"VOCs":0.0,"x.y":"31.109129,120.869278","NOx":0.0,"latitude":31.109129,"CO":0.0,"PM":0.0,"longitude":120.869278}]
2019-11-01 04:30:00 [{"VOCs":0.0,"x.y":"31.104637,120.869278","NOx":0.0,"latitude":31.104637,"CO":0.0,"PM":0.0,"longitude":120.869278},{"VOCs":0.0,"x.y":"31.109129,120.869278","NOx":0.0,"latitude":31.109129,"CO":0.0,"PM":0.0,"longitude":120.869278}]
2019-11-01 04:00:00 [{"VOCs":0.0,"x.y":"31.104637,120.869278","NOx":0.0,"latitude":31.104637,"CO":0.0,"PM":0.0,"longitude":120.869278},{"VOCs":0.0,"x.y":"31.109129,120.869278","NOx":0.0,"latitude":31.109129,"CO":0.0,"PM":0.0,"longitude":120.869278}]
2019-11-01 03:30:00 [{"VOCs":0.0,"x.y":"31.104637,120.869278","NOx":0.0,"latitude":31.104637,"CO":0.0,"PM":0.0,"longitude":120.869278},{"VOCs":0.0,"x.y":"31.109129,120.869278","NOx":0.0,"latitude":31.109129,"CO":0.0,"PM":0.0,"longitude":120.869278}]
2019-11-01 03:00:00 [{"VOCs":0.0,"x.y":"31.104637,120.869278","NOx":0.0,"latitude":31.104637,"CO":0.0,"PM":0.0,"longitude":120.869278},{"VOCs":0.0,"x.y":"31.109129,120.869278","NOx":0.0,"latitude":31.109129,"CO":0.0,"PM":0.0,"longitude":120.869278}]
2019-11-01 02:30:00 [{"VOCs":0.0,"x.y":"31.104637,120.869278","NOx":0.0,"latitude":31.104637,"CO":0.0,"PM":0.0,"longitude":120.869278},{"VOCs":0.0,"x.y":"31.109129,120.869278","NOx":0.0,"latitude":31.109129,"CO":0.0,"PM":0.0,"longitude":120.869278}]
2019-11-01 02:00:00 [{"VOCs":0.0,"x.y":"31.104637,120.869278","NOx":0.0,"latitude":31.104637,"CO":0.0,"PM":0.0,"longitude":120.869278},{"VOCs":0.0,"x.y":"31.109129,120.869278","NOx":0.0,"latitude":31.109129,"CO":0.0,"PM":0.0,"longitude":120.869278}]
2019-11-01 01:30:00 [{"VOCs":0.0,"x.y":"31.104637,120.869278","NOx":0.0,"latitude":31.104637,"CO":0.0,"PM":0.0,"longitude":120.869278},{"VOCs":0.0,"x.y":"31.109129,120.869278","NOx":0.0,"latitude":31.109129,"CO":0.0,"PM":0.0,"longitude":120.869278}]
2019-11-01 01:00:00 [{"VOCs":0.0,"x.y":"31.104637,120.869278","NOx":0.0,"latitude":31.104637,"CO":0.0,"PM":0.0,"longitude":120.869278},{"VOCs":0.0,"x.y":"31.109129,120.869278","NOx":0.0,"latitude":31.109129,"CO":0.0,"PM":0.0,"longitude":120.869278}]
2019-11-01 00:30:00 [{"VOCs":0.0,"x.y":"31.104637,120.869278","NOx":0.0,"latitude":31.104637,"CO":0.0,"PM":0.0,"longitude":120.869278},{"VOCs":0.0,"x.y":"31.109129,120.869278","NOx":0.0,"latitude":31.109129,"CO":0.0,"PM":0.0,"longitude":120.869278}]
2019-11-01 00:00:00 [{"VOCs":0.0,"x.y":"31.104637,120.869278","NOx":0.0,"latitude":31.104637,"CO":0.0,"PM":0.0,"longitude":120.869278},{"VOCs":0.0,"x.y":"31.109129,120.869278","NOx":0.0,"latitude":31.109129,"CO":0.0,"PM":0.0,"longitude":120.869278}]
2、python处理过程如下
书写python时务必要注意缩进,缩进不同,代表的意思完全不同。
# -*-coding:utf-8 -*-
import csv
import time
# 打开要写入的csv文件,预备写入用,没有的话会新建一个csv文件
#with open 会自动关闭打开的文件,所以写入操作须在打开文件时操作(不与withopen同一缩进级别)
with open('11-01_1.csv','w') as csvfile:
#创建一个csvwriter对象,用于写文件
spamwriter = csv.writer(csvfile, dialect='excel')
# 读要转换的txt文件,文件每行各词间以字符分隔
with open('11-01_1.txt','r') as filein:
#循环打开的文件中的行
for line in filein:
#文件中行的内容以'},'分割
line_list = line.strip('\n').split('},') #我这里的数据之间是以 }, 间隔的
#循环行中的内容
for colum in line_list:
#行中的内容以;分割
columtxt = colum.split(';')
#行中以分号分割的部分写为scv即excel中的一行
spamwriter.writerow(columtxt)
#双循环及写文件完成,关闭打开的文件
filein.close()
#双循环及写文件完成,关闭打开的文件
csvfile.close()
image.gif
3、处理结果如下:
test.csv4、优化
可进一步对此数据进行分列显示,最终分出数据的值到一个单独的列,直接对数据进行分析