smallpig 发表于 2007-10-25 21:12

选取文件中内容编程或脚本编写

一个如 htm文件,里边有<head><body>,需要将<body>外的内容删除,<body>内的内容保留。。大家知道如何写么。。

用java,或者批量出黎脚本

powerwind 发表于 2007-10-25 21:34

首先要保证<body>外的内容没有“<body>”,那读出整个文件到字符串,然后

str.substring(str.indexOf("<body>"),str.lastIndexOf("</body>"));

smallpig 发表于 2007-10-25 21:49

可否详细小小,真的不懂。。。。而且需要批量。。。

smallpig 发表于 2007-10-25 21:55

ps <body></body> 这两个tab也需要删除,只是留下 <body>内的内容。。

powerwind 发表于 2007-10-25 22:41

LZ会用Java读写文件吗?

假设会,读一个文件的内容到一个字符串,剩下的事情就是对字符串操作了,然后写回到文件即可。

至于批量,递归读一个文件夹里的文件即可

smallpig 发表于 2007-10-25 22:55

<body></body> 这两个tab也需要删除 ,请问如何删除

powerwind 发表于 2007-10-25 23:14

str.substring(str.indexOf("<body")+6,str.lastIndexOf("</body>"));

如果6不合适,试试7或5、或1

wool王 发表于 2007-10-26 00:11

个人推荐一个最高效既解决方式:稳2楼帮手搞掂就得了~至于请食饭或者比现金就睇LZ嘎啦~横掂一个月稳甘多钱,都唔争在分D比校友啦~

smallpig 发表于 2007-10-26 00:14

不耻上问:

如何批量,循环读一个目录内的文件阿。。。。

powerwind 发表于 2007-10-26 00:46

严重同意8#



import java.util.*;
import java.io.*;

public class Test {
   
    public static void main(String[] args)throws Exception {
getFiles(new File("E:\\java"));
System.out.println(fileList.size());
    }

      private static List fileList=new ArrayList(159);

      public static void getFiles(File file){
                if (file.isDirectory()) {
                         // 获取目录下的文件和目录
         File[] files = file.listFiles();
         // 非空目录
         if (files != null) {
            for (int i = 0; i < files.length; i++) {
                  getFiles(files);
            }
         }
                }else {
                        fileList.add(file);
                        System.out.println(file.getName());
                }
      }
}

[ 本帖最后由 powerwind 于 2007-10-26 21:06 编辑 ]

wjlsunshine 发表于 2007-10-26 01:17

LZ 是想实现文章采集功能吧???

wool王 发表于 2007-10-26 09:37

丢到UNIX下写SHELL啦~~

LZ可以叫番碌碌宾宾佢地帮你写~哈哈哈~碌碌果D出手既话5分钟就帮你写好啦~

smallpig 发表于 2007-10-26 10:43

你叫我去死算喇

smallpig 发表于 2007-10-26 10:56

原帖由 powerwind 于 2007-10-25 23:14 发表 https://www.gdutbbs.com/images/common/back.gif
str.substring(str.indexOf("<body")+6,str.lastIndexOf("</body>"));

如果6不合适,试试7或5、或1




这样有一个大问题

如果<body>标签是<body leftmargin="0" topmargin="0">这种形式呢?
<script>
var str="sdff<body leftmargin=/"0/" topmargin=/"0/">这种形式呢?

(-.-)游謉 发表于 2007-10-26 11:07

碌碌宾宾,好名字

smallpig 发表于 2007-10-26 13:05

/*
* Created on Oct 26, 2007
*
* To change the template for this generated file go to
* Window>Preferences>Java>Code Generation>Code and Comments
*/
import java.io.*;

/**
* @author 43361813
*
* To change the template for this generated type comment go to
* Window>Preferences>Java>Code Generation>Code and Comments
*/
public class FiterContentOfHTML {

public static void main(String[] args) throws IOException {

File directory = new File(args);

if (directory.isDirectory()) {

File targetDirectory = new File("Result_Of_Fitering");
boolean targetDirExist = true;
if (!targetDirectory.exists()) {
targetDirExist = targetDirectory.mkdir();
}

String[] fileNames = directory.list();
for (int i = 0; i < fileNames.length; i++) {
String fileName = directory.getPath() + File.separator + fileNames;
File file = new File(fileName);

if (file.isFile()) {
BufferedReader br = new BufferedReader(new FileReader(fileName));
String oneLineOfContent = null;
StringBuffer fullContent = new StringBuffer();
boolean afterBodyTag = false;
boolean bodyTagInMultiLines = false;
while ((oneLineOfContent = br.readLine()) != null) {

// For "<body>"
if ((oneLineOfContent.toUpperCase().indexOf("") != -1) ||
(bodyTagInMultiLines && oneLineOfContent.indexOf(">") != -1)) {

afterBodyTag = true;
bodyTagInMultiLines = false;
fullContent.delete(0, fullContent.length());
continue;
}
else if (oneLineOfContent.toUpperCase().indexOf("<BODY") !="-1)" {
bodyTagInMultiLines = true;
continue;
}

// For "</body>"
int locationOfBodyTag = oneLineOfContent.toUpperCase().indexOf("");
if (locationOfBodyTag != -1) {
fullContent.append(oneLineOfContent.subSequence(0, locationOfBodyTag));
//如果subSequence报错,可以用substring
afterBodyTag = false;
}

if (afterBodyTag) {
fullContent.append(oneLineOfContent);
fullContent.append("\r\n");
}
}
br.close();

if (targetDirExist) {
String targetFileName = targetDirectory.getPath() + File.separator + fileNames;
BufferedWriter bw = new BufferedWriter(new FileWriter(targetFileName));
bw.write("\r\n" +
fullContent.toString());
bw.close();
}
else {
System.out.println("The target folder is not created! Please rerun the program! ^__^");
}
System.out.println(fileNames + " is filtered content completely! ^__^");
}
}
}
else {
System.out.println("The directory name has something wrong! Please input once again! ^__^");
}
}
}

[ 本帖最后由 smallpig 于 2007-10-26 17:26 编辑 ]

smallpig 发表于 2007-10-26 16:07

java.sun.com

www.eclipse.org

活在阳光下 发表于 2007-10-26 17:04

自己动手,丰衣足食,
不用什么都让别人帮吧?

smallpig 发表于 2007-10-26 17:25


/*
* Created on Oct 26, 2007
*
* To change the template for this generated file go to
* Window>Preferences>Java>Code Generation>Code and Comments
*/
import java.io.*;
/**
* @author 43361813
*
* To change the template for this generated type comment go to
* Window>Preferences>Java>Code Generation>Code and Comments
*/
public class FiterContentOfHTML {
public static void main(String[] args) throws IOException {

File directory = new File(args);

if (directory.isDirectory()) {
   
   File targetDirectory = new File("Result_Of_Fitering");
   boolean targetDirExist = true;
   if (!targetDirectory.exists()) {
    targetDirExist = targetDirectory.mkdir();
   }
   
   String[] fileNames = directory.list();
   for (int i = 0; i < fileNames.length; i++) {
    String fileName = directory.getPath() + File.separator + fileNames;
    File file = new File(fileName);
   
    if (file.isFile()) {
   BufferedReader br = new BufferedReader(new FileReader(fileName));
   String oneLineOfContent = null;
   StringBuffer fullContent = new StringBuffer();
   boolean afterBodyTag = false;
   boolean bodyTagInMultiLines = false;
   boolean commentInMultiLines = false;
   while ((oneLineOfContent = br.readLine()) != null) {
      
      // For "<body>"
      if ((oneLineOfContent.toUpperCase().indexOf("<BODY") != -1 && oneLineOfContent.indexOf(">") != -1) ||
       (bodyTagInMultiLines && oneLineOfContent.indexOf(">") != -1)) {
      
       afterBodyTag = true;
       bodyTagInMultiLines = false;
       fullContent.delete(0, fullContent.length());
       continue;
      }
      else if (oneLineOfContent.toUpperCase().indexOf("<BODY") != -1) {
       bodyTagInMultiLines = true;
       continue;
      }
      
      // For "</body>"
      int locationOfBodyTag = oneLineOfContent.toUpperCase().indexOf("</BODY>");
      if (locationOfBodyTag != -1) {
       fullContent.append(oneLineOfContent.substring(0, locationOfBodyTag));
       afterBodyTag = false;
      }
      
      if (afterBodyTag) {
      
       // For "<!-- -->"
       int locationOfCommentBeginningTag = oneLineOfContent.indexOf("<!--");
       int locationOfCommentEndingTag = oneLineOfContent.indexOf("-->");
       if (locationOfCommentBeginningTag != -1 && locationOfCommentEndingTag != -1) {
      fullContent.append(oneLineOfContent.substring(0, locationOfCommentBeginningTag));
      continue;
       }
       else if (locationOfCommentBeginningTag != -1) {
      fullContent.append(oneLineOfContent.substring(0, locationOfCommentBeginningTag));
      commentInMultiLines = true;
      continue;
       }
       else if (locationOfCommentEndingTag != -1) {
      fullContent.append(oneLineOfContent.substring(locationOfCommentEndingTag + 3));
      commentInMultiLines = false;
      continue;
       }
      
       if (!commentInMultiLines) {
      fullContent.append(oneLineOfContent);
      fullContent.append("\r\n");
       }
      }
   }
   br.close();
   
   if (targetDirExist) {
      String targetFileName = targetDirectory.getPath() + File.separator + fileNames;
      BufferedWriter bw = new BufferedWriter(new FileWriter(targetFileName));
      bw.write("<%@page language=\"java\" contentType=\"text/html; charset=utf-8\" pageEncoding=\"utf-8\"%>\r\n" +
      fullContent.toString());
      bw.close();
   }
   else {
      System.out.println("The target folder is not created! Please rerun the program! ^__^");
   }
   System.out.println(fileNames + " is filtered content completely! ^__^");
    }
   }
}
else {
   System.out.println("The directory name has something wrong! Please input once again! ^__^");
}
}
}

smallpig 发表于 2007-10-26 17:25

感谢楼上各位得帮助

特别感谢PPT的大力支持
页: [1] 2
查看完整版本: 选取文件中内容编程或脚本编写