选取文件中内容编程或脚本编写
一个如 htm文件,里边有<head><body>,需要将<body>外的内容删除,<body>内的内容保留。。大家知道如何写么。。用java,或者批量出黎脚本
首先要保证<body>外的内容没有“<body>”,那读出整个文件到字符串,然后
str.substring(str.indexOf("<body>"),str.lastIndexOf("</body>")); 可否详细小小,真的不懂。。。。而且需要批量。。。
ps <body></body> 这两个tab也需要删除,只是留下 <body>内的内容。。 LZ会用Java读写文件吗?
假设会,读一个文件的内容到一个字符串,剩下的事情就是对字符串操作了,然后写回到文件即可。
至于批量,递归读一个文件夹里的文件即可 <body></body> 这两个tab也需要删除 ,请问如何删除 str.substring(str.indexOf("<body")+6,str.lastIndexOf("</body>"));
如果6不合适,试试7或5、或1 个人推荐一个最高效既解决方式:稳2楼帮手搞掂就得了~至于请食饭或者比现金就睇LZ嘎啦~横掂一个月稳甘多钱,都唔争在分D比校友啦~ 不耻上问:
如何批量,循环读一个目录内的文件阿。。。。
严重同意8#
import java.util.*;
import java.io.*;
public class Test {
public static void main(String[] args)throws Exception {
getFiles(new File("E:\\java"));
System.out.println(fileList.size());
}
private static List fileList=new ArrayList(159);
public static void getFiles(File file){
if (file.isDirectory()) {
// 获取目录下的文件和目录
File[] files = file.listFiles();
// 非空目录
if (files != null) {
for (int i = 0; i < files.length; i++) {
getFiles(files);
}
}
}else {
fileList.add(file);
System.out.println(file.getName());
}
}
}
[ 本帖最后由 powerwind 于 2007-10-26 21:06 编辑 ] LZ 是想实现文章采集功能吧??? 丢到UNIX下写SHELL啦~~
LZ可以叫番碌碌宾宾佢地帮你写~哈哈哈~碌碌果D出手既话5分钟就帮你写好啦~ 你叫我去死算喇
原帖由 powerwind 于 2007-10-25 23:14 发表 https://www.gdutbbs.com/images/common/back.gif
str.substring(str.indexOf("<body")+6,str.lastIndexOf("</body>"));
如果6不合适,试试7或5、或1
这样有一个大问题
如果<body>标签是<body leftmargin="0" topmargin="0">这种形式呢?
<script>
var str="sdff<body leftmargin=/"0/" topmargin=/"0/">这种形式呢? 碌碌宾宾,好名字 /*
* Created on Oct 26, 2007
*
* To change the template for this generated file go to
* Window>Preferences>Java>Code Generation>Code and Comments
*/
import java.io.*;
/**
* @author 43361813
*
* To change the template for this generated type comment go to
* Window>Preferences>Java>Code Generation>Code and Comments
*/
public class FiterContentOfHTML {
public static void main(String[] args) throws IOException {
File directory = new File(args);
if (directory.isDirectory()) {
File targetDirectory = new File("Result_Of_Fitering");
boolean targetDirExist = true;
if (!targetDirectory.exists()) {
targetDirExist = targetDirectory.mkdir();
}
String[] fileNames = directory.list();
for (int i = 0; i < fileNames.length; i++) {
String fileName = directory.getPath() + File.separator + fileNames;
File file = new File(fileName);
if (file.isFile()) {
BufferedReader br = new BufferedReader(new FileReader(fileName));
String oneLineOfContent = null;
StringBuffer fullContent = new StringBuffer();
boolean afterBodyTag = false;
boolean bodyTagInMultiLines = false;
while ((oneLineOfContent = br.readLine()) != null) {
// For "<body>"
if ((oneLineOfContent.toUpperCase().indexOf("") != -1) ||
(bodyTagInMultiLines && oneLineOfContent.indexOf(">") != -1)) {
afterBodyTag = true;
bodyTagInMultiLines = false;
fullContent.delete(0, fullContent.length());
continue;
}
else if (oneLineOfContent.toUpperCase().indexOf("<BODY") !="-1)" {
bodyTagInMultiLines = true;
continue;
}
// For "</body>"
int locationOfBodyTag = oneLineOfContent.toUpperCase().indexOf("");
if (locationOfBodyTag != -1) {
fullContent.append(oneLineOfContent.subSequence(0, locationOfBodyTag));
//如果subSequence报错,可以用substring
afterBodyTag = false;
}
if (afterBodyTag) {
fullContent.append(oneLineOfContent);
fullContent.append("\r\n");
}
}
br.close();
if (targetDirExist) {
String targetFileName = targetDirectory.getPath() + File.separator + fileNames;
BufferedWriter bw = new BufferedWriter(new FileWriter(targetFileName));
bw.write("\r\n" +
fullContent.toString());
bw.close();
}
else {
System.out.println("The target folder is not created! Please rerun the program! ^__^");
}
System.out.println(fileNames + " is filtered content completely! ^__^");
}
}
}
else {
System.out.println("The directory name has something wrong! Please input once again! ^__^");
}
}
}
[ 本帖最后由 smallpig 于 2007-10-26 17:26 编辑 ] java.sun.com
www.eclipse.org 自己动手,丰衣足食,
不用什么都让别人帮吧?
/*
* Created on Oct 26, 2007
*
* To change the template for this generated file go to
* Window>Preferences>Java>Code Generation>Code and Comments
*/
import java.io.*;
/**
* @author 43361813
*
* To change the template for this generated type comment go to
* Window>Preferences>Java>Code Generation>Code and Comments
*/
public class FiterContentOfHTML {
public static void main(String[] args) throws IOException {
File directory = new File(args);
if (directory.isDirectory()) {
File targetDirectory = new File("Result_Of_Fitering");
boolean targetDirExist = true;
if (!targetDirectory.exists()) {
targetDirExist = targetDirectory.mkdir();
}
String[] fileNames = directory.list();
for (int i = 0; i < fileNames.length; i++) {
String fileName = directory.getPath() + File.separator + fileNames;
File file = new File(fileName);
if (file.isFile()) {
BufferedReader br = new BufferedReader(new FileReader(fileName));
String oneLineOfContent = null;
StringBuffer fullContent = new StringBuffer();
boolean afterBodyTag = false;
boolean bodyTagInMultiLines = false;
boolean commentInMultiLines = false;
while ((oneLineOfContent = br.readLine()) != null) {
// For "<body>"
if ((oneLineOfContent.toUpperCase().indexOf("<BODY") != -1 && oneLineOfContent.indexOf(">") != -1) ||
(bodyTagInMultiLines && oneLineOfContent.indexOf(">") != -1)) {
afterBodyTag = true;
bodyTagInMultiLines = false;
fullContent.delete(0, fullContent.length());
continue;
}
else if (oneLineOfContent.toUpperCase().indexOf("<BODY") != -1) {
bodyTagInMultiLines = true;
continue;
}
// For "</body>"
int locationOfBodyTag = oneLineOfContent.toUpperCase().indexOf("</BODY>");
if (locationOfBodyTag != -1) {
fullContent.append(oneLineOfContent.substring(0, locationOfBodyTag));
afterBodyTag = false;
}
if (afterBodyTag) {
// For "<!-- -->"
int locationOfCommentBeginningTag = oneLineOfContent.indexOf("<!--");
int locationOfCommentEndingTag = oneLineOfContent.indexOf("-->");
if (locationOfCommentBeginningTag != -1 && locationOfCommentEndingTag != -1) {
fullContent.append(oneLineOfContent.substring(0, locationOfCommentBeginningTag));
continue;
}
else if (locationOfCommentBeginningTag != -1) {
fullContent.append(oneLineOfContent.substring(0, locationOfCommentBeginningTag));
commentInMultiLines = true;
continue;
}
else if (locationOfCommentEndingTag != -1) {
fullContent.append(oneLineOfContent.substring(locationOfCommentEndingTag + 3));
commentInMultiLines = false;
continue;
}
if (!commentInMultiLines) {
fullContent.append(oneLineOfContent);
fullContent.append("\r\n");
}
}
}
br.close();
if (targetDirExist) {
String targetFileName = targetDirectory.getPath() + File.separator + fileNames;
BufferedWriter bw = new BufferedWriter(new FileWriter(targetFileName));
bw.write("<%@page language=\"java\" contentType=\"text/html; charset=utf-8\" pageEncoding=\"utf-8\"%>\r\n" +
fullContent.toString());
bw.close();
}
else {
System.out.println("The target folder is not created! Please rerun the program! ^__^");
}
System.out.println(fileNames + " is filtered content completely! ^__^");
}
}
}
else {
System.out.println("The directory name has something wrong! Please input once again! ^__^");
}
}
}
感谢楼上各位得帮助
特别感谢PPT的大力支持
页:
[1]
2